Thread: Fix a race condition in ConditionVariableTimedSleep()

Fix a race condition in ConditionVariableTimedSleep()

From
Bertrand Drouvot
Date:
Hi hackers,

While working on wait event related stuff I observed a failed assertion:

"
TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line:
91
"

during pg_regress/database.

To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():

And launch:

postgres=# CREATE TABLESPACE regress_tblspace LOCATION '/home/postgres/bdttbs';
CREATE TABLESPACE
postgres=# create database regression_bdt9;
CREATE DATABASE
postgres=# ALTER DATABASE regression_bdt9 SET TABLESPACE regress_tblspace;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The issue is that in ConditionVariableTimedSleep() during the WaitLatch() we’ll
also trigger ProcessInterrupts() that will call ConditionVariableCancelSleep()
that will:

- remove the process from the wait list
- and also set cv_sleep_target to NULL.

Then in ConditionVariableTimedSleep(), we’ll go through:

“
          /*
           * If this process has been taken out of the wait list, then we know
           * that it has been signaled by ConditionVariableSignal (or
           * ConditionVariableBroadcast), so we should return to the caller. But
           * that doesn't guarantee that the exit condition is met, only that we
           * ought to check it.  So we must put the process back into the wait
           * list, to ensure we don't miss any additional wakeup occurring while
           * the caller checks its exit condition.  We can take ourselves out of
           * the wait list only when the caller calls
           * ConditionVariableCancelSleep.
           *
           * If we're still in the wait list, then the latch must have been set
           * by something other than ConditionVariableSignal; though we don't
           * guarantee not to return spuriously, we'll avoid this obvious case.
           */
          SpinLockAcquire(lock: &cv->mutex);
          if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink))
          {
              done = true;
              proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink);
          }
          SpinLockRelease(&cv->mutex);
“

Means we re-add the process in the wait list. The issue is that cv_sleep_target
is still NULL so that we’ll exit ConditionVariableTimedSleep() due to:

"
          if (cv != cv_sleep_target)
              done = true;

          /* We were signaled, so return */
          if (done)
              return false;
"

Later we’ll want to add our process to a wait list, calling ConditionVariablePrepareToSleep()
that will not call ConditionVariableCancelSleep() due to:

“
      if (cv_sleep_target != NULL)
          ConditionVariableCancelSleep();
“

As cv_sleep_target is still NULL.
Finally calling proclist_push_tail() that will trigger the failed assertion.

The proposed fix attached is done in ConditionVariableTimedSleep() as this is the
place that introduces the race condition. It re-assigns cv_sleep_target to cv
and then ensures that cv_sleep_target accurately describes which condition
variable we’re prepared to wait on.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Fix a race condition in ConditionVariableTimedSleep()

From
Nazir Bilal Yavuz
Date:
Hi,

Thanks for the report and the patch!

On Mon, 5 May 2025 at 10:53, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> The proposed fix attached is done in ConditionVariableTimedSleep() as this is the
> place that introduces the race condition. It re-assigns cv_sleep_target to cv
> and then ensures that cv_sleep_target accurately describes which condition
> variable we’re prepared to wait on.
>
> Looking forward to your feedback,

I am able to reproduce the race condition and confirm that the
proposed patch fixes the problem.

However AFAIU, the code expects that ConditionVariableCancelSleep()
should not be called while waiting for the latch, right? If that is
the case, does not the reproducer code violate this?

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Fix a race condition in ConditionVariableTimedSleep()

From
Yura Sokolov
Date:
05.05.2025 10:52, Bertrand Drouvot пишет:
> Hi hackers,
> 
> While working on wait event related stuff I observed a failed assertion:
> 
> "
> TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line:
91
> "
> 
> during pg_regress/database.
> 
> To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
> would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():


Interestingly, our colleague stepped into same problem recently [1] . It
happened because he attempted to make overcomplex timeout (SIGALARM) handler.

But his solution was a bit different [2].

[1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru
[2]
https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch

And I believe, his solution is more elegant. Doesn't it?

But in first step, I doubt there should be any thing that cancels condition
variable during WaitLatch. Most probably you did wrong thing.

We convinced the colleague to rework the code to not trigger the issue in
first place.

-- 
regards
Yura Sokolov aka funny-falcon



Re: Fix a race condition in ConditionVariableTimedSleep()

From
Bertrand Drouvot
Date:
Hi,

On Mon, May 05, 2025 at 12:15:39PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> Thanks for the report and the patch!
> 
> On Mon, 5 May 2025 at 10:53, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > The proposed fix attached is done in ConditionVariableTimedSleep() as this is the
> > place that introduces the race condition. It re-assigns cv_sleep_target to cv
> > and then ensures that cv_sleep_target accurately describes which condition
> > variable we’re prepared to wait on.
> >
> > Looking forward to your feedback,
> 
> I am able to reproduce the race condition and confirm that the
> proposed patch fixes the problem.

Thanks for looking at it!

> However AFAIU, the code expects that ConditionVariableCancelSleep()
> should not be called while waiting for the latch

Are you referring to this comment "We can take ourselves out of the wait list only
when the caller calls ConditionVariableCancelSleep"?

> right?

If so, I'm not sure that this comment meant to say that "we should not call
ConditionVariableCancelSleep()" prior to this point..

If that's the case then we could add an Assert that cv_sleep_target should not
be NULL instead but still just "adding" an ereport(LOG,...) is enough to break
this rule (if any) so I'm tempted to say that's better to adress the case
instead.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Fix a race condition in ConditionVariableTimedSleep()

From
Bertrand Drouvot
Date:
Hi,

On Mon, May 05, 2025 at 02:15:15PM +0300, Yura Sokolov wrote:
> Interestingly, our colleague stepped into same problem recently [1] . It
> happened because he attempted to make overcomplex timeout (SIGALARM) handler.
> 
> But his solution was a bit different [2].
> 
> [1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru
> [2]
> https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch
> 
> And I believe, his solution is more elegant. Doesn't it?

I'm not sure as it'd not maintain the initial intent to re-add to the
wait list.

> But in first step, I doubt there should be any thing that cancels condition
> variable during WaitLatch.

I'm not 100% sure.

>  Most probably you did wrong thing.

I "just" added an ereport(LOG,..) that has been enough to cancel the condition
variable and trigger the failed assertion. I agree that you could still consider
this as wrong thing if you think that we should not do anything that cancels
condition variable during WaitLatch though.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Fix a race condition in ConditionVariableTimedSleep()

From
Yura Sokolov
Date:
05.05.2025 17:05, Bertrand Drouvot пишет:
> Hi,
> 
> On Mon, May 05, 2025 at 02:15:15PM +0300, Yura Sokolov wrote:
>> Interestingly, our colleague stepped into same problem recently [1] . It
>> happened because he attempted to make overcomplex timeout (SIGALARM) handler.
>>
>> But his solution was a bit different [2].
>>
>> [1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru
>> [2]
>> https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch
>>
>> And I believe, his solution is more elegant. Doesn't it?
> 
> I'm not sure as it'd not maintain the initial intent to re-add to the
> wait list.

It will be re-added in next iteration of outer sleeping loop.

>> But in first step, I doubt there should be any thing that cancels condition
>> variable during WaitLatch.
> 
> I'm not 100% sure.

There're no such things at the moment. WaitLatch doesn't check for
interrupts and doesn't do any other thing that could break
ConditionVariable. And I believe, if it start to do, we're going to be in
trouble.

> I "just" added an ereport(LOG,..) that has been enough to cancel the
condition
> variable and trigger the failed assertion.

It is because errfinish, called from ereport, invokes CHECK_FOR_INTERRUPTS.

Just wrap you ereport with HOLD_INTERRUPTS/RESUME_INTERRUPTS, and
everything will be "just" fine.

-- 
regards
Yura Sokolov aka funny-falcon