pgsql: Prevent invalidation of newly created replication slots. - Mailing list pgsql-committers

From Amit Kapila
Subject pgsql: Prevent invalidation of newly created replication slots.
Date
Msg-id E1vdkOa-005Fql-23@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Prevent invalidation of newly created replication slots.

A race condition could cause a newly created replication slot to become
invalidated between WAL reservation and a checkpoint.

Previously, if the required WAL was removed, we retried the reservation
process. However, the slot could still be invalidated before the retry if
the WAL was not yet removed but the checkpoint advanced the redo pointer
beyond the slot's intended restart LSN and computed the minimum LSN that
needs to be preserved for the slots.

The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock
during WAL reservation, and a shared lock during the minimum LSN
calculation at checkpoints to serialize the process. This ensures that, if
WAL reservation occurs first, the checkpoint waits until restart_lsn is
updated before calculating the minimum LSN. If the checkpoint runs first,
subsequent WAL reservations pick a position at or after the latest
checkpoint's redo pointer.

We used a similar fix in HEAD (via commit 006dd4b2e5) and 18. The
difference is that in 17 and prior branches we need to additionally handle
the race condition with slot's minimum LSN computation during checkpoints.

Reported-by: suyu.cmj <mengjuan.cmj@alibaba-inc.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan.cmj@alibaba-inc.com

Branch
------
REL_15_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/aae05622a7cbd17a0081452ef4cc4eeda54e4e2e

Modified Files
--------------
src/backend/access/transam/xlog.c |  30 +++++++++--
src/backend/replication/slot.c    | 106 +++++++++++++++++++-------------------
2 files changed, 81 insertions(+), 55 deletions(-)


pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: pg_createsubscriber: Improve handling of automated recovery conf
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Remove use of rindex() function