Re: Race condition in InvalidateObsoleteReplicationSlots() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Race condition in InvalidateObsoleteReplicationSlots()
Date
Msg-id 20220223014855.4lsddr464i7mymk2@alap3.anarazel.de
Whole thread Raw
In response to Re: Race condition in InvalidateObsoleteReplicationSlots()  (Álvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Race condition in InvalidateObsoleteReplicationSlots()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2021-06-11 12:27:57 -0400, Álvaro Herrera wrote:
> On 2021-Jun-10, Álvaro Herrera wrote:
> 
> > Here's a version that I feel is committable (0001).  There was an issue
> > when returning from the inner loop, if in a previous iteration we had
> > released the lock.  In that case we need to return with the lock not
> > held, so that the caller can acquire it again, but weren't.  This seems
> > pretty hard to hit in practice (I suppose somebody needs to destroy the
> > slot just as checkpointer killed the walsender, but before checkpointer
> > marks it as its own) ... but if it did happen, I think checkpointer
> > would block with no recourse.  Also added some comments and slightly
> > restructured the code.
> > 
> > There are plenty of conflicts in pg13, but it's all easy to handle.
> 
> Pushed, with additional minor changes.

I stared at this code, due to [1], and I think I found a bug. I think it's not
the cause of the failures in that thread, but we probably should still do
something about it.

I think the minor changes might unfortunately have introduced a race? Before
the patch just used ConditionVariableSleep(), but now it also has a
ConditionVariablePrepareToSleep().  Without re-checking the sleep condition
until
            /* Wait until the slot is released. */
            ConditionVariableSleep(&s->active_cv,
                                   WAIT_EVENT_REPLICATION_SLOT_DROP);

which directly violates what ConditionVariablePrepareToSleep() documents:

 * This can optionally be called before entering a test/sleep loop.
 * Doing so is more efficient if we'll need to sleep at least once.
 * However, if the first test of the exit condition is likely to succeed,
 * it's more efficient to omit the ConditionVariablePrepareToSleep call.
 * See comments in ConditionVariableSleep for more detail.
 *
 * Caution: "before entering the loop" means you *must* test the exit
 * condition between calling ConditionVariablePrepareToSleep and calling
 * ConditionVariableSleep.  If that is inconvenient, omit calling
 * ConditionVariablePrepareToSleep.


Afaics this means we can potentially sleep forever if the prior owner of the
slot releases it before the ConditionVariablePrepareToSleep().

There's a comment that's mentioning this danger:

            /*
             * Prepare the sleep on the slot's condition variable before
             * releasing the lock, to close a possible race condition if the
             * slot is released before the sleep below.
             */
            ConditionVariablePrepareToSleep(&s->active_cv);

            LWLockRelease(ReplicationSlotControlLock);

but afaics that is bogus, because releasing a slot doesn't take
ReplicationSlotControlLock. That just protects against the slot being dropped,
not against it being released.

We can ConditionVariablePrepareToSleep() here, but we'd have to it earlier,
before the checks at the start of the while loop.

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20220218231415.c4plkp4i3reqcwip%40alap3.anarazel.de



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Design of pg_stat_subscription_workers vs pgstats
Next
From: Tom Lane
Date:
Subject: Re: Time to drop plpython2?