Race condition in InvalidateObsoleteReplicationSlots() - Mailing list pgsql-hackers

From Andres Freund
Subject Race condition in InvalidateObsoleteReplicationSlots()
Date
Msg-id 20210408001037.wfmk6jud36auhfqm@alap3.anarazel.de
Whole thread Raw
Responses Re: Race condition in InvalidateObsoleteReplicationSlots()
List pgsql-hackers
Hi,

I was looking at InvalidateObsoleteReplicationSlots() while reviewing /
polishing the logical decoding on standby patch. Which lead me to notice that
I think there's a race in InvalidateObsoleteReplicationSlots() (because
ResolveRecoveryConflictWithLogicalSlots has a closely related one).

void
InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno)
{
...
    LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
    for (int i = 0; i < max_replication_slots; i++)
    {
...
        if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN)
            continue;
        LWLockRelease(ReplicationSlotControlLock);
...
        for (;;)
        {
...
            wspid = ReplicationSlotAcquireInternal(s, NULL, SAB_Inquire);
...
        SpinLockAcquire(&s->mutex);
        s->data.invalidated_at = s->data.restart_lsn;
        s->data.restart_lsn = InvalidXLogRecPtr;
        SpinLockRelease(&s->mutex);
...


As far as I can tell there's no guarantee that the slot wasn't concurrently
dropped and another replication slot created at the same offset in
ReplicationSlotCtl->replication_slots. Which we then promptly would
invalidate, regardless of the slot not actually needing to be invalidated.

Note that this is different from the race mentioned in a comment:
            /*
             * Signal to terminate the process that owns the slot.
             *
             * There is the race condition where other process may own
             * the slot after the process using it was terminated and before
             * this process owns it. To handle this case, we signal again
             * if the PID of the owning process is changed than the last.
             *
             * XXX This logic assumes that the same PID is not reused
             * very quickly.
             */

It's one thing to terminate a connection erroneously - permanently breaking a
replica due to invalidating the wrong slot or such imo is different.


Interestingly this problem seems to have been present both in

commit c6550776394e25c1620bc8258427c8f1d448080d
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   2020-04-07 18:35:00 -0400

    Allow users to limit storage reserved by replication slots

commit f9e9704f09daf882f5a1cf1fbe3f5a3150ae2bb9
Author: Fujii Masao <fujii@postgresql.org>
Date:   2020-06-19 17:15:52 +0900

    Fix issues in invalidation of obsolete replication slots.


I think this can be solved in two different ways:

1) Hold ReplicationSlotAllocationLock with LW_SHARED across most of
   InvalidateObsoleteReplicationSlots(). That way nobody could re-create a new
   slot in the to-be-obsoleted-slot's place.

2) Atomically check whether the slot needs to be invalidated and try to
   acquire if needed. Don't release ReplicationSlotControlLock between those
   two steps. Signal the owner to release the slot iff we couldn't acquire the
   slot. In the latter case wait and then recheck if the slot still needs to
   be dropped.

To me 2) seems better, because we then can also be sure that the slot still
needs to be obsoleted, rather than potentially doing so unnecessarily.


It looks to me like several of the problems here stem from trying to reuse
code from ReplicationSlotAcquireInternal() (which before this was just named
ReplicationSlotAcquire()).  I don't think that makes sense, because cases like
this want to check if a condition is true, and acquire it only if so.

IOW, I think this basically needs to look like ReplicationSlotsDropDBSlots(),
except that a different condition is checked, and the if (active_pid) case
needs to prepare a condition variable, signal the owner and then wait on the
condition variable, to restart after.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies
Next
From: Michael Paquier
Date:
Subject: Re: Why is specifying oids = false multiple times in create table is silently ignored?