Re: Issue with the PRNG used by Postgres - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Issue with the PRNG used by Postgres
Date
Msg-id 4084246.1712768585@sss.pgh.pa.us
Whole thread Raw
In response to Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Issue with the PRNG used by Postgres
List pgsql-hackers
I wrote:
> I don't think it's correct to re-initialize the SpinDelayStatus each
> time around the outer loop.  That state should persist through the
> entire acquire operation, as it does in a regular spinlock acquire.
> As this stands, it resets the delay to minimum each time around the
> outer loop, and I bet it is that behavior not the RNG that's to blame
> for what he's seeing.

After thinking about this some more, it is fairly clear that that *is*
a mistake that can cause a thundering-herd problem.  Assume we have
two or more backends waiting in perform_spin_delay, and for whatever
reason the scheduler wakes them up simultaneously.  They see the
LW_FLAG_LOCKED bit clear (because whoever had the lock when they went
to sleep is long gone) and iterate the outer loop.  One gets the lock;
the rest go back to sleep.  And they will all sleep exactly
MIN_DELAY_USEC, because they all reset their SpinDelayStatus.
Lather, rinse, repeat.  If the s_lock code were being used as
intended, they would soon acquire different cur_delay settings;
but that never happens.  That is not the RNG's fault.

So I think we need something like the attached.

            regards, tom lane

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b1e388dc7c..54c84b9d67 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -857,46 +857,49 @@ static void
 LWLockWaitListLock(LWLock *lock)
 {
     uint32        old_state;
-#ifdef LWLOCK_STATS
-    lwlock_stats *lwstats;
-    uint32        delays = 0;

-    lwstats = get_lwlock_stats_entry(lock);
-#endif
+    /*
+     * Try once to acquire the lock, before we go to the trouble of setting up
+     * state for the spin loop.
+     */
+    old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+    if (!(old_state & LW_FLAG_LOCKED))
+        return;                    /* got lock */

-    while (true)
     {
-        /* always try once to acquire lock directly */
-        old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
-        if (!(old_state & LW_FLAG_LOCKED))
-            break;                /* got lock */
+        SpinDelayStatus delayStatus;
+#ifdef LWLOCK_STATS
+        lwlock_stats *lwstats = get_lwlock_stats_entry(lock);
+#endif

-        /* and then spin without atomic operations until lock is released */
-        {
-            SpinDelayStatus delayStatus;
+        init_local_spin_delay(&delayStatus);

-            init_local_spin_delay(&delayStatus);
+        while (true)
+        {
+            /* try again to acquire lock directly */
+            old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+            if (!(old_state & LW_FLAG_LOCKED))
+                break;            /* got lock */

+            /* spin without atomic operations until lock is released */
             while (old_state & LW_FLAG_LOCKED)
             {
                 perform_spin_delay(&delayStatus);
                 old_state = pg_atomic_read_u32(&lock->state);
             }
-#ifdef LWLOCK_STATS
-            delays += delayStatus.delays;
-#endif
-            finish_spin_delay(&delayStatus);
+
+            /*
+             * Retry. The lock might obviously already be re-acquired by the
+             * time we're attempting to get it again.
+             */
         }

-        /*
-         * Retry. The lock might obviously already be re-acquired by the time
-         * we're attempting to get it again.
-         */
-    }
+        finish_spin_delay(&delayStatus);

 #ifdef LWLOCK_STATS
-    lwstats->spin_delay_count += delays;
+        lwstats->spin_delay_count += delayStatus.delays;
 #endif
+    }
 }

 /*

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Table AM Interface Enhancements
Next
From: Andres Freund
Date:
Subject: Re: Issue with the PRNG used by Postgres