Hi,
On 2024-04-11 16:46:10 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2024-04-11 16:11:40 -0400, Tom Lane wrote:
> >> We wouldn't need to fix it, if we simply removed the NUM_DELAYS
> >> limit. Whatever kicked us off the sleep doesn't matter, we might
> >> as well go check the spinlock.
>
> > I suspect we should fix it regardless of whether we keep NUM_DELAYS. We
> > shouldn't increase cur_delay faster just because a lot of signals are coming
> > in.
>
> I'm unconvinced there's a problem there.
Obviously that's a different aspect than efficiency, but in local, admittedly
extreme, testing I've seen stuck spinlocks being detected in a fraction of the
normally expected time. A spinlock that ends up sleeping for close to a second
after a relatively short amount of time surely isn't good for predictable
performance.
IIRC the bad case was on a hot standby, with some recovery conflict causing
the startup process to send a lot of signals.
> Also, what would you do about this that wouldn't involve adding kernel calls
> for gettimeofday? Admittedly, if we only do that when we're about to sleep,
> maybe it's not so awful; but it's still adding complexity that I'm
> unconvinced is warranted.
At least on !windows, pg_usleep() uses nanosleep(), which, when interrupted by
a signal, can return the remaining time until the experation of the timer.
I suspect that on windows computing the time when a signal arrived wouldn't be
expensive, compared to all the other overhead implied by our signal handling
emulation.
Greetings,
Andres Freund