On Tuesday, December 16, 2025 2:54 AM Joao Foltran <joao@foltrandba.com> wrote:
> Hi hackers,
>
> I'd like to report a regression in PostgreSQL 18 regarding physical replication
> slot invalidation and propose a fix.
>
> It's my first time sending any type of contribution, so please let me know if I
> made anything incorrectly and I'll fix it ASAP.
>
> It's also my first time doing any type of code inside the postgres project, so if
> the logic or anything I used is incorrect let me know.
>
> CCing Amit, since he committed f41d8468 and 8709dcc.
>
> ## Problem
>
> Commit f41d8468 introduced an ERROR when trying to acquire an invalidated
> replication slot. While this is correct for logical replication slots (which cannot
> safely recover after invalidation), it breaks recovery for physical replication
> slots.
>
> Later, commit 8709dcc improved upon this code to prevent a race condition
> and moved the check to after the slot was already acquired.
>
> In PostgreSQL 17 and earlier, when a physical replication slot was invalidated
> due to max_slot_wal_keep_size, the slot could still be reacquired if the
> required WAL became available through restore_command or archive
> recovery in the standby. This is a common operational scenario:
>
> - Temporary network issues
> - Planned maintenance windows
> - Standbys temporarily falling behind
I think the ability to acquire an invalidated slot represents an
potentially risky behavior. AFAICS, we do not currently support
recovering invalidated slots. This implies that once a slot becomes invalidated,
it does not offer any protection anymore. Even if the restart_lsn or xmin is valid for
such a slot, WAL and rows can be removed at any time. For further clarification,
please refer to ReplicationSlotsComputeRequiredLSN(), where we deliberately
exclude counting the restart_lsn for an invalidated slot.
>
> After commit f41d8468, physical replication slots cannot be reacquired once
> invalidated, even when the required WAL is available via archive recovery.
> The standby remains stuck recovering from archive and cannot resume
> streaming replication, demanding manual intervention (slot recreation).
>
I think even if the WALs is temporary available via archive recovery, since the slot
cannot protect any further WALs and rows from being removed, it could cause
problems later.
Best Regards,
Hou zj