While stracing a process that was contending for a spinlock, I noticed that the sleep time would often have a longish sequence of 1 and 2 msec sleep times, rather than the rapidly but randomly increasing sleep time intended. (At first it looked like it was sleeping on a different attempt each time, rather than re-sleeping, but some additional logging showed this was not the case, it was re-sleeping on the same lock attempt.)
The problem is that the state is maintained only to an integer number of milliseconds starting at 1, so it can take a number of attempts for the random increment to jump from 1 to 2, and then from 2 to 3.
If the state was instead maintained to an integer number of microseconds starting at 1000, this granularity of the jump would not be a problem.
And once the state is kept to microseconds, I see no point in not passing the microsecond precision to pg_usleep.
The attached patch changes the resolution of the state variable to microseconds, but keeps the starting value at 1msec, i.e. 1000 usec.
Arguably the MIN_DELAY_USEC value should be reduced from 1000 as well given current hardware, but I'll leave that as a separate issue.
There is the comment:
* The pg_usleep() delays are measured in milliseconds because 1 msec is a
* common resolution limit at the OS level for newer platforms. On older
* platforms the resolution limit is usually 10 msec, in which case the
* total delay before timeout will be a bit more.
I think the first sentence is out of date (CentOS 6.3 on "Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz" not exactly bleeding edge, seems to have a resolution of 100 usec), and also somewhat bogus (what benefit do we get by preemptively rounding to some level just because some common platforms will do so anyway?). I've just removed the whole comment, though maybe some version of the last sentence needs to be put back.
I have no evidence that the current granularity of the random escalation is actually causing performance problems, only that is causes confusion problems for people following along. But I think the change can be justified based purely on it being cleaner code and more future proof.
This could be made obsolete by changing to futexes, but I doubt that that is going to happen soon, and certainly not for all platforms.
I will add to commitfest Next
Cheers,
Jeff