Thread: pgsql: Fix a possibility of logical replication slot's restart_lsn goin

pgsql: Fix a possibility of logical replication slot's restart_lsn goin

From
Masahiko Sawada
Date:
Fix a possibility of logical replication slot's restart_lsn going backwards.

Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.

A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.

In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.

This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.

The WAL removal issue was reported by Hubert Depesz Lubaczewski.

Backpatch to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.

Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me
Backpatch-through: 13

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/26c4e896869093f07e9acb982f71bb3c4dae5901

Modified Files
--------------
src/backend/replication/logical/logical.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)