Re: Issue with the PRNG used by Postgres - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Issue with the PRNG used by Postgres
Date
Msg-id 20240411202139.hysgmnyksqyijcrp@awork3.anarazel.de
Whole thread Raw
In response to Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Issue with the PRNG used by Postgres
List pgsql-hackers
Hi,

On 2024-04-11 16:11:40 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2024-04-11 15:24:28 -0400, Robert Haas wrote:
> >> Or, rip out the whole, whole mechanism and just don't PANIC.
>
> > I continue believe that that'd be a quite bad idea.
>
> I'm warming to it myself.
>
> > My suspicion is that most of the false positives are caused by lots of signals
> > interrupting the pg_usleep()s. Because we measure the number of delays, not
> > the actual time since we've been waiting for the spinlock, signals
> > interrupting pg_usleep() trigger can very significantly shorten the amount of
> > time until we consider a spinlock stuck.  We should fix that.
>
> We wouldn't need to fix it, if we simply removed the NUM_DELAYS
> limit.  Whatever kicked us off the sleep doesn't matter, we might
> as well go check the spinlock.

I suspect we should fix it regardless of whether we keep NUM_DELAYS. We
shouldn't increase cur_delay faster just because a lot of signals are coming
in.  If it were just user triggered signals it'd probably not be worth
worrying about, but we do sometimes send a lot of signals ourselves...


> Also, you propose in your other message replacing spinlocks with lwlocks.
> Whatever the other merits of that, I notice that we have no timeout or
> "stuck lwlock" detection.

True. And that's not great. But at least lwlocks can be identified in
pg_stat_activity, which does help some.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Should we add a compiler warning for large stack frames?
Next
From: Robert Haas
Date:
Subject: Re: Table AM Interface Enhancements