Fix a race condition in ConditionVariableTimedSleep() - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Fix a race condition in ConditionVariableTimedSleep()
Date
Msg-id aBhuTqNhMN3prcqe@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
Responses Re: Fix a race condition in ConditionVariableTimedSleep()
Re: Fix a race condition in ConditionVariableTimedSleep()
List pgsql-hackers
Hi hackers,

While working on wait event related stuff I observed a failed assertion:

"
TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line:
91
"

during pg_regress/database.

To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():

And launch:

postgres=# CREATE TABLESPACE regress_tblspace LOCATION '/home/postgres/bdttbs';
CREATE TABLESPACE
postgres=# create database regression_bdt9;
CREATE DATABASE
postgres=# ALTER DATABASE regression_bdt9 SET TABLESPACE regress_tblspace;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The issue is that in ConditionVariableTimedSleep() during the WaitLatch() we’ll
also trigger ProcessInterrupts() that will call ConditionVariableCancelSleep()
that will:

- remove the process from the wait list
- and also set cv_sleep_target to NULL.

Then in ConditionVariableTimedSleep(), we’ll go through:

“
          /*
           * If this process has been taken out of the wait list, then we know
           * that it has been signaled by ConditionVariableSignal (or
           * ConditionVariableBroadcast), so we should return to the caller. But
           * that doesn't guarantee that the exit condition is met, only that we
           * ought to check it.  So we must put the process back into the wait
           * list, to ensure we don't miss any additional wakeup occurring while
           * the caller checks its exit condition.  We can take ourselves out of
           * the wait list only when the caller calls
           * ConditionVariableCancelSleep.
           *
           * If we're still in the wait list, then the latch must have been set
           * by something other than ConditionVariableSignal; though we don't
           * guarantee not to return spuriously, we'll avoid this obvious case.
           */
          SpinLockAcquire(lock: &cv->mutex);
          if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink))
          {
              done = true;
              proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink);
          }
          SpinLockRelease(&cv->mutex);
“

Means we re-add the process in the wait list. The issue is that cv_sleep_target
is still NULL so that we’ll exit ConditionVariableTimedSleep() due to:

"
          if (cv != cv_sleep_target)
              done = true;

          /* We were signaled, so return */
          if (done)
              return false;
"

Later we’ll want to add our process to a wait list, calling ConditionVariablePrepareToSleep()
that will not call ConditionVariableCancelSleep() due to:

“
      if (cv_sleep_target != NULL)
          ConditionVariableCancelSleep();
“

As cv_sleep_target is still NULL.
Finally calling proclist_push_tail() that will trigger the failed assertion.

The proposed fix attached is done in ConditionVariableTimedSleep() as this is the
place that introduces the race condition. It re-assigns cv_sleep_target to cv
and then ensures that cv_sleep_target accurately describes which condition
variable we’re prepared to wait on.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: PATCH: pg_dump to support "on conflict do update"
Next
From: Mikhail Kharitonov
Date:
Subject: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root