Re: Issue with the PRNG used by Postgres - Mailing list pgsql-hackers

From Parag Paul
Subject Re: Issue with the PRNG used by Postgres
Date
Msg-id CAA=PXp1bmVxTpe2wrHwNEZoBrqzwRm9od=S=EOUiSnHNDApUFw@mail.gmail.com
Whole thread Raw
In response to Re: Issue with the PRNG used by Postgres  (Andres Freund <andres@anarazel.de>)
Responses Re: Issue with the PRNG used by Postgres
List pgsql-hackers
Hi Andres,
This is a little bit more complex than that. The spinlocks are taken in the LWLock(Mutex) code, when the lock is not available right away. 
The spinlock is taken to attach the current backend to the wait list of the LWLock. This means, that this cannot be controlled. 
The repro when it happens, it affects any mutex or LWLock code path, since the low hamming index can cause problems by removing fairness from the system. 

Also, I believe the rounding off error still remains within the RNG. I will send a patch today.

Thanks for the response.
-Parag

On Tue, Apr 9, 2024 at 2:05 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2024-04-08 22:52:09 -0700, Parag Paul wrote:
>  We have an interesting problem, where PG went to PANIC due to stuck
> spinlock case.
> On careful analysis and hours of trying to reproduce this(something that
> showed up in production after almost 2 weeks of stress run), I did some
> statistical analysis on the RNG generator that PG uses to create the
> backoff for the spin locks.

ISTM that the fix here is to not use a spinlock for whatever the contention is
on, rather than improve the RNG.

Greetings,

Andres Freund

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Issue with the PRNG used by Postgres
Next
From: Parag Paul
Date:
Subject: Re: Issue with the PRNG used by Postgres