Re: Issue with the PRNG used by Postgres - Mailing list pgsql-hackers

From Parag Paul
Subject Re: Issue with the PRNG used by Postgres
Date
Msg-id CAA=PXp3jBDvx7HwOfeF8OFKZA7WD=ZDA+zdpTARnJaYWu2_2cw@mail.gmail.com
Whole thread Raw
In response to Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Issue with the PRNG used by Postgres
List pgsql-hackers
hi Tom, 
 Sorry for the delayed response. I was collecting of the data from my production servers. 

The reason why this could be a problem is a flaw in the RNG with the enlarged Hamming belt. 
I attached an image here, with the RNG outputs from 2 backends. I ran our code for weeks, and collected ther
values generated by the RNG over many backends. The one in Green (say backend id 600), stopped flapping values and
only produced low (near 0 ) values for half an hour, whereas the Blue(say backend 700), kept generating good values and had
a range between [0-1)
During this period, the backed 600 suffered and ended up with spinlock stuck condition. 

-Parag


On Wed, Apr 10, 2024 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually ... Parag mentioned that this was specifically about
lwlock.c's usage of spinlocks.  It doesn't really use a spinlock,
but it does use s_lock.c's delay logic, and I think it's got the
usage pattern wrong:

    while (true)
    {
        /* always try once to acquire lock directly */
        old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
        if (!(old_state & LW_FLAG_LOCKED))
            break;                /* got lock */

        /* and then spin without atomic operations until lock is released */
        {
            SpinDelayStatus delayStatus;

            init_local_spin_delay(&delayStatus);

            while (old_state & LW_FLAG_LOCKED)
            {
                perform_spin_delay(&delayStatus);
                old_state = pg_atomic_read_u32(&lock->state);
            }
#ifdef LWLOCK_STATS
            delays += delayStatus.delays;
#endif
            finish_spin_delay(&delayStatus);
        }

        /*
         * Retry. The lock might obviously already be re-acquired by the time
         * we're attempting to get it again.
         */
    }

I don't think it's correct to re-initialize the SpinDelayStatus each
time around the outer loop.  That state should persist through the
entire acquire operation, as it does in a regular spinlock acquire.
As this stands, it resets the delay to minimum each time around the
outer loop, and I bet it is that behavior not the RNG that's to blame
for what he's seeing.

(One should still wonder what is the LWLock usage pattern that is
causing this spot to become so heavily contended.)

                        regards, tom lane
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Issue with the PRNG used by Postgres
Next
From: Parag Paul
Date:
Subject: Re: Issue with the PRNG used by Postgres