Thread: Fix a race condition in ConditionVariableTimedSleep()
Hi hackers, While working on wait event related stuff I observed a failed assertion: " TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line: 91 " during pg_regress/database. To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever would trigger ConditionVariableBroadcast() in pgstat_report_wait_end(): And launch: postgres=# CREATE TABLESPACE regress_tblspace LOCATION '/home/postgres/bdttbs'; CREATE TABLESPACE postgres=# create database regression_bdt9; CREATE DATABASE postgres=# ALTER DATABASE regression_bdt9 SET TABLESPACE regress_tblspace; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. The issue is that in ConditionVariableTimedSleep() during the WaitLatch() we’ll also trigger ProcessInterrupts() that will call ConditionVariableCancelSleep() that will: - remove the process from the wait list - and also set cv_sleep_target to NULL. Then in ConditionVariableTimedSleep(), we’ll go through: “ /* * If this process has been taken out of the wait list, then we know * that it has been signaled by ConditionVariableSignal (or * ConditionVariableBroadcast), so we should return to the caller. But * that doesn't guarantee that the exit condition is met, only that we * ought to check it. So we must put the process back into the wait * list, to ensure we don't miss any additional wakeup occurring while * the caller checks its exit condition. We can take ourselves out of * the wait list only when the caller calls * ConditionVariableCancelSleep. * * If we're still in the wait list, then the latch must have been set * by something other than ConditionVariableSignal; though we don't * guarantee not to return spuriously, we'll avoid this obvious case. */ SpinLockAcquire(lock: &cv->mutex); if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink)) { done = true; proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink); } SpinLockRelease(&cv->mutex); “ Means we re-add the process in the wait list. The issue is that cv_sleep_target is still NULL so that we’ll exit ConditionVariableTimedSleep() due to: " if (cv != cv_sleep_target) done = true; /* We were signaled, so return */ if (done) return false; " Later we’ll want to add our process to a wait list, calling ConditionVariablePrepareToSleep() that will not call ConditionVariableCancelSleep() due to: “ if (cv_sleep_target != NULL) ConditionVariableCancelSleep(); “ As cv_sleep_target is still NULL. Finally calling proclist_push_tail() that will trigger the failed assertion. The proposed fix attached is done in ConditionVariableTimedSleep() as this is the place that introduces the race condition. It re-assigns cv_sleep_target to cv and then ensures that cv_sleep_target accurately describes which condition variable we’re prepared to wait on. Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, Thanks for the report and the patch! On Mon, 5 May 2025 at 10:53, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > The proposed fix attached is done in ConditionVariableTimedSleep() as this is the > place that introduces the race condition. It re-assigns cv_sleep_target to cv > and then ensures that cv_sleep_target accurately describes which condition > variable we’re prepared to wait on. > > Looking forward to your feedback, I am able to reproduce the race condition and confirm that the proposed patch fixes the problem. However AFAIU, the code expects that ConditionVariableCancelSleep() should not be called while waiting for the latch, right? If that is the case, does not the reproducer code violate this? -- Regards, Nazir Bilal Yavuz Microsoft
05.05.2025 10:52, Bertrand Drouvot пишет: > Hi hackers, > > While working on wait event related stuff I observed a failed assertion: > > " > TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line: 91 > " > > during pg_regress/database. > > To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever > would trigger ConditionVariableBroadcast() in pgstat_report_wait_end(): Interestingly, our colleague stepped into same problem recently [1] . It happened because he attempted to make overcomplex timeout (SIGALARM) handler. But his solution was a bit different [2]. [1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru [2] https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch And I believe, his solution is more elegant. Doesn't it? But in first step, I doubt there should be any thing that cancels condition variable during WaitLatch. Most probably you did wrong thing. We convinced the colleague to rework the code to not trigger the issue in first place. -- regards Yura Sokolov aka funny-falcon
Hi, On Mon, May 05, 2025 at 12:15:39PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > Thanks for the report and the patch! > > On Mon, 5 May 2025 at 10:53, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > The proposed fix attached is done in ConditionVariableTimedSleep() as this is the > > place that introduces the race condition. It re-assigns cv_sleep_target to cv > > and then ensures that cv_sleep_target accurately describes which condition > > variable we’re prepared to wait on. > > > > Looking forward to your feedback, > > I am able to reproduce the race condition and confirm that the > proposed patch fixes the problem. Thanks for looking at it! > However AFAIU, the code expects that ConditionVariableCancelSleep() > should not be called while waiting for the latch Are you referring to this comment "We can take ourselves out of the wait list only when the caller calls ConditionVariableCancelSleep"? > right? If so, I'm not sure that this comment meant to say that "we should not call ConditionVariableCancelSleep()" prior to this point.. If that's the case then we could add an Assert that cv_sleep_target should not be NULL instead but still just "adding" an ereport(LOG,...) is enough to break this rule (if any) so I'm tempted to say that's better to adress the case instead. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, May 05, 2025 at 02:15:15PM +0300, Yura Sokolov wrote: > Interestingly, our colleague stepped into same problem recently [1] . It > happened because he attempted to make overcomplex timeout (SIGALARM) handler. > > But his solution was a bit different [2]. > > [1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru > [2] > https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch > > And I believe, his solution is more elegant. Doesn't it? I'm not sure as it'd not maintain the initial intent to re-add to the wait list. > But in first step, I doubt there should be any thing that cancels condition > variable during WaitLatch. I'm not 100% sure. > Most probably you did wrong thing. I "just" added an ereport(LOG,..) that has been enough to cancel the condition variable and trigger the failed assertion. I agree that you could still consider this as wrong thing if you think that we should not do anything that cancels condition variable during WaitLatch though. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
05.05.2025 17:05, Bertrand Drouvot пишет: > Hi, > > On Mon, May 05, 2025 at 02:15:15PM +0300, Yura Sokolov wrote: >> Interestingly, our colleague stepped into same problem recently [1] . It >> happened because he attempted to make overcomplex timeout (SIGALARM) handler. >> >> But his solution was a bit different [2]. >> >> [1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru >> [2] >> https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch >> >> And I believe, his solution is more elegant. Doesn't it? > > I'm not sure as it'd not maintain the initial intent to re-add to the > wait list. It will be re-added in next iteration of outer sleeping loop. >> But in first step, I doubt there should be any thing that cancels condition >> variable during WaitLatch. > > I'm not 100% sure. There're no such things at the moment. WaitLatch doesn't check for interrupts and doesn't do any other thing that could break ConditionVariable. And I believe, if it start to do, we're going to be in trouble. > I "just" added an ereport(LOG,..) that has been enough to cancel the condition > variable and trigger the failed assertion. It is because errfinish, called from ereport, invokes CHECK_FOR_INTERRUPTS. Just wrap you ereport with HOLD_INTERRUPTS/RESUME_INTERRUPTS, and everything will be "just" fine. -- regards Yura Sokolov aka funny-falcon