From 3c53488da4557972e5f62b0a3566c2783427c0c5 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 24 Dec 2019 14:36:16 +1300 Subject: [PATCH] Don't reset latch in ConditionVariablePrepareToSleep(). It's not OK to do that without calling CHECK_FOR_INTERRUPTS(), and it's probably better to make all latch interaction code follow the standard pattern. Let's leave the latch set if it's set, so that the next wait loop (mostly likely ConditionVariableSleep()) sees it and handles it, if we don't reach some other CHECK_FOR_INTERRUPTS() first. One consequence of this bug was that a SIGTERM delivered in a very narrow timing window could leave a parallel worker process waiting forever for a condition variable that will never be signaled, after an error was raised in other process. Back-patch to 10, where condition variables were introduced. The bug was probably not easy to hit before commit 1321509f in master, but it was still a violation of the latch/interrupt protocol, so repair. In the back-branches only, also move the CHECK_FOR_INTERRUPTS() to after the ResetLatch() call, to close a related race: a SIGTERM that arrives around that same time as a condition variable signal could be merged, and the ConditionVariableSleep() could return having reset the latch and noticed the CV signal, and then the caller could wait on a latch. Author: Thomas Munro Reviewed-by: Shawn Debnath Reported-by: Tomas Vondra Discussion: https://postgr.es/m/CA%2BhUKGJOm8zZHjVA8svoNT3tHY0XdqmaC_kHitmgXDQM49m1dA%40mail.gmail.com --- src/backend/storage/lmgr/condition_variable.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c index 58b7b51472..6c4dbaec1c 100644 --- a/src/backend/storage/lmgr/condition_variable.c +++ b/src/backend/storage/lmgr/condition_variable.c @@ -92,12 +92,6 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv) /* Record the condition variable on which we will sleep. */ cv_sleep_target = cv; - /* - * Reset my latch before adding myself to the queue, to ensure that we - * don't miss a wakeup that occurs immediately. - */ - ResetLatch(MyLatch); - /* Add myself to the wait queue. */ SpinLockAcquire(&cv->mutex); proclist_push_tail(&cv->wakeup, pgprocno, cvWaitLink); @@ -148,8 +142,6 @@ ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info) do { - CHECK_FOR_INTERRUPTS(); - /* * Wait for latch to be set. (If we're awakened for some other * reason, the code below will cope anyway.) @@ -160,6 +152,8 @@ ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info) /* Reset latch before examining the state of the wait list. */ ResetLatch(MyLatch); + CHECK_FOR_INTERRUPTS(); + /* * If this process has been taken out of the wait list, then we know * that it has been signaled by ConditionVariableSignal (or -- 2.23.0