Re: Condition variable live lock - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Condition variable live lock
Date
Msg-id 17054.1515126429@sss.pgh.pa.us
Whole thread Raw
In response to Re: Condition variable live lock  (Andres Freund <andres@anarazel.de>)
Responses Re: Condition variable live lock
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2018-01-04 12:39:47 -0500, Robert Haas wrote:
>>> Given that the proclist_contains() checks in condition_variable.c are
>>> already racy, I think it might be feasible to collect all procnos to
>>> signal while holding the spinlock, and then signal all of them in one
>>> go.

>> That doesn't seem very nice at all.  Not only does it violate the
>> coding rule against looping while holding a spinlock, but it seems
>> that it would require allocating memory while holding one, which is a
>> non-starter.

> We could just use a sufficiently sized buffer beforehand. There's an
> obvious upper boundary, so that shouldn't be a big issue.

I share Robert's discomfort with that solution, but it seems to me there
might be a better way.  The attached patch uses our own cvWaitLink as a
sentinel to detect when we've woken everybody who was on the wait list
before we arrived.  That gives exactly the desired semantics, not just an
approximation to them.

Now, the limitation with this is that we can't be waiting for any *other*
condition variable, because then we'd be trashing our state about that
variable.  As coded, we can't be waiting for the target CV either, but
that case could actually be handled if we needed to, as per the comment.
I do not know if this is likely to be a problematic limitation
... discuss.  (The patch does survive check-world, FWIW.)

            regards, tom lane

diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 41378d6..0d08eb5 100644
*** a/src/backend/storage/lmgr/condition_variable.c
--- b/src/backend/storage/lmgr/condition_variable.c
*************** int
*** 214,228 ****
  ConditionVariableBroadcast(ConditionVariable *cv)
  {
      int            nwoken = 0;

      /*
!      * Let's just do this the dumbest way possible.  We could try to dequeue
!      * all the sleepers at once to save spinlock cycles, but it's a bit hard
!      * to get that right in the face of possible sleep cancelations, and we
!      * don't want to loop holding the mutex.
       */
!     while (ConditionVariableSignal(cv))
          ++nwoken;

      return nwoken;
  }
--- 214,273 ----
  ConditionVariableBroadcast(ConditionVariable *cv)
  {
      int            nwoken = 0;
+     int            pgprocno = MyProc->pgprocno;

      /*
!      * In some use-cases, it is common for awakened processes to immediately
!      * re-queue themselves.  If we just naively try to reduce the wakeup list
!      * to empty, we'll get into a potentially-indefinite loop against such a
!      * process.  The semantics we really want are just to be sure that we have
!      * wakened all processes that were in the list at entry.  We can use our
!      * own cvWaitLink as a sentinel to detect when we've finished.
!      *
!      * A seeming flaw in this approach is that someone else might signal the
!      * CV and in doing so remove our sentinel entry.  But that's fine: since
!      * CV waiters are always added and removed in order, that must mean that
!      * every previous waiter has been wakened, so we're done.  We'd get an
!      * extra "set" on our latch, which is slightly inefficient but harmless.
!      *
!      * This coding assumes (and asserts) that we never broadcast on a CV that
!      * we are also waiting for.  If that turns out to be necessary, we could
!      * just remove our entry and insert it at the end, and it still works as a
!      * sentinel.
       */
!
!     /* We should not be attempting a CV wait. */
!     Assert(cv_sleep_target == NULL);
!
!     /* Establish the sentinel entry. */
!     SpinLockAcquire(&cv->mutex);
!     Assert(!proclist_contains(&cv->wakeup, pgprocno, cvWaitLink));
!     proclist_push_tail(&cv->wakeup, pgprocno, cvWaitLink);
!     SpinLockRelease(&cv->mutex);
!
!     for (;;)
!     {
!         bool        aminlist;
!         PGPROC       *proc = NULL;
!
!         /*
!          * Check if the sentinel is still there, and if so, remove the first
!          * queue entry (which might be the sentinel, or not).
!          */
!         SpinLockAcquire(&cv->mutex);
!         aminlist = proclist_contains(&cv->wakeup, pgprocno, cvWaitLink);
!         if (aminlist)
!             proc = proclist_pop_head_node(&cv->wakeup, cvWaitLink);
!         SpinLockRelease(&cv->mutex);
!
!         /* If we or anyone else removed the sentinel, we're done. */
!         if (!aminlist || proc == MyProc)
!             break;
!
!         /* We found someone sleeping, so set their latch to wake them up. */
!         SetLatch(&proc->procLatch);
          ++nwoken;
+     }

      return nwoken;
  }

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Next
From: Haribabu Kommi
Date:
Subject: Re: [HACKERS] Refactor handling of database attributes betweenpg_dump and pg_dumpall