pgsql: Prevent invalidation of newly synced replication slots. - Mailing list pgsql-committers

From Amit Kapila
Subject pgsql: Prevent invalidation of newly synced replication slots.
Date
Msg-id E1vkc3p-002uNv-0p@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Prevent invalidation of newly synced replication slots.

A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.

When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.

To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.

However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.

This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com

Branch
------
REL_17_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/3243c0177efb77511269d6037ee668abd2e81396

Modified Files
--------------
src/backend/access/transam/xlog.c                  |  6 +-
src/backend/replication/logical/slotsync.c         | 97 +++++++++++-----------
src/include/access/xlog.h                          |  1 +
src/test/recovery/t/046_checkpoint_logical_slot.pl | 84 ++++++++++++++++++-
4 files changed, 136 insertions(+), 52 deletions(-)


pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: Include extended statistics data in pg_dump
Next
From: Robert Haas
Date:
Subject: pgsql: pg_waldump: Remove file-level global WalSegSz.