I think this suggestion is aimed directly at Matt.
Running GPU-Grid normally works fine with low CPU load. However, on my pretty fast GTX970 (and similar ones) running Win 8.1 (with WDM performance tax) I still get ~2% higher GPU utilization if I set the environment variable SWAN_SYNC to trigger constant polling. I welcome the performance boost but obviously loose a CPU core.
For these results I'm already setting the CPU thread priority to "higher than normal" and have reserved a physical core (with HT) for GPU-Grid via process lasso.
I assume in normal mode the GPU-Grid thread running on the CPU estimates how long the GPU will need for the next time step (or when ever the next CPU intervention is needed) and asks the OS to be waked up in time.
I also assume that the performance difference to constant polling mode arises from the "time to wake up the CPU thread" being just a bit too long every now and then. This could arise either from the estimate of the sleep time being too long or from the OS taking a bit longer than requested to wake up the thread.
Currently the value we set for the environment variable SWAN_SYNC has no effect. The app just checks its existance. You could extend this behaviour by using its value as a correction factor to the normal sleep time estimate. A value of 1 would signal the current default behaviour, 0 would cause constant polling and anything in between would scale the sleep time. At a setting of 0.5 we'd get halfed sleep times and approximately twice the CPU load, which would still be far from using a full core.
This functionality should be quite easy to implement and using moderate values of 0.8 - 0.9 might bring us much closer to the performance of constant polling with a minor increase in CPU time.
If my assuptions are wrong this obviously fails. There's also the possibility that the observed performance difference does no originate from many small losses, but rather from a few cases where the sleep time guess was as correct as always, but due to some OS or driver related reason the CPU thread was woken up far too late. In this case a small correction to the sleep time estimate wouldn't help at all.
I'm sure you've tested the current implementation, but would like to see if things can be pushed further :)
Scanning for our furry friends since Jan 2002