Re: Issue with the PRNG used by Postgres - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Issue with the PRNG used by Postgres
Date
Msg-id 20240411213023.zd2776ahyz5xzyhw@awork3.anarazel.de
Whole thread Raw
In response to Re: Issue with the PRNG used by Postgres  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Issue with the PRNG used by Postgres
List pgsql-hackers
Hi,

On 2024-04-11 16:46:23 -0400, Robert Haas wrote:
> On Thu, Apr 11, 2024 at 3:52 PM Andres Freund <andres@anarazel.de> wrote:
> > My suspicion is that most of the false positives are caused by lots of signals
> > interrupting the pg_usleep()s. Because we measure the number of delays, not
> > the actual time since we've been waiting for the spinlock, signals
> > interrupting pg_usleep() trigger can very significantly shorten the amount of
> > time until we consider a spinlock stuck.  We should fix that.
> 
> I mean, go nuts. But <dons asbestos underpants, asbestos regular
> pants, 2 pair of asbestos socks, 3 asbestos shirts, 2 asbestos
> jackets, and then hides inside of a flame-proof capsule at the bottom
> of the Pacific ocean> this is just another thing like query hints,
> where everybody says "oh, the right thing to do is fix X or Y or Z and
> then you won't need it". But of course it never actually gets fixed
> well enough that people stop having problems in the real world. And
> eventually we look like a developer community that cares more about
> our own opinion about what is right than what the experience of real
> users actually is.

I don't think that's a particularly apt comparison. If you have spinlocks that
cannot be acquired within tens of seconds, you're in a really bad situation,
regardless of whether you crash-restart or not.

Whereas with hints, you might actually be operating perfectly normally when
using hints.  Never using the wrong plan is also just an order of magnitude
harder and fuzzier problem than ensuring we don't wait for spinlocks for a
long time.


> In all seriousness, I'd really like to understand what experience
> you've had that makes this check seem useful. Because I think all of
> my experiences with it have been bad. If they weren't, the last good
> one was a very long time ago.

By far the most of the stuck spinlocks I've seen were due to bugs in
out-of-core extensions. Absurdly enough, the next common thing probably is due
to people using gdb to make an uninterruptible process break out of some code,
without a crash-restart, accidentally doing so while a spinlock is held.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Cary Huang
Date:
Subject: Re: [Patch] add multiple client certificate selection feature
Next
From: David Steele
Date:
Subject: Re: post-freeze damage control