Race condition in replication slot usage introduced by commit f41d846 - Mailing list pgsql-hackers

From Nisha Moond
Subject Race condition in replication slot usage introduced by commit f41d846
Date
Msg-id CABdArM7J-LbGoMPGUPiFiLOyB_TZ5+YaZb=HMES0mQqzVTn8Gg@mail.gmail.com
Whole thread Raw
Responses Re: Race condition in replication slot usage introduced by commit f41d846
List pgsql-hackers
Hi,

Commit f41d846 [1] introduced a race condition that allows a process
to acquire and continue using an invalidated replication slot, leading
to unexpected behavior.

Example Scenario:
Suppose there is an orphaned logical slot that will be invalidated on
the next checkpoint. Consider the following sequence of events during
ReplicationSlotAcquire() -
 - A process (P1) attempts to acquire the slot. It checks the
invalidated flag and sees that the slot is still valid, so it proceeds
to set the s->active_pid to its own PID.
 - Before P1 can acquire the spinlock and update s->active_pid =
MyProcPid, the checkpointer process invalidates the slot.
 - Now, P1 proceeds to set s->active_pid = MyProcPid and starts using
an already invalidated slot, which should not happen.

Using an invalidated slot should be prevented in the first place. In
my test, this scenario triggered an assertion failure in
CreateDecodingContext() -

```
TRAP: failed Assert("slot->data.invalidated == RS_INVAL_NONE"), File:
"logical.c", Line: 546, PID: 1042884
postgres: nisha postgres [local]
SELECT(ExceptionalCondition+0xbb)[0x649ff7696c82]
postgres: nisha postgres [local]
SELECT(CreateDecodingContext+0x23b)[0x649ff73bf9d3]
postgres: nisha postgres [local] SELECT(+0x61fb71)[0x649ff73c3b71]
postgres: nisha postgres [local]
SELECT(pg_logical_slot_get_changes+0x26)[0x649ff73c3ed6]
...
```

The expected behavior was for ReplicationSlotAcquire() to detect the
invalid slot and return an error instead of allowing its usage.

Attached the patch v1 fixing this issue.

Issue reported by: Hou Zhijie.

[1] https://github.com/postgres/postgres/commit/f41d8468ddea34170fe19fdc17b5a247e7d3ac78

--
Thanks,
Nisha

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Get rid of WALBufMappingLock
Next
From: Alexander Korotkov
Date:
Subject: Re: Removing unneeded self joins