Hi Hackers,
I have accidentally noticed that pg_replication_slot_advance only
changes in-memory state of the slot when its type is physical. Its new
value does not survive restart.
Reproduction steps:
1) Create new slot and remember its restart_lsn
SELECT pg_create_physical_replication_slot('slot1', true);
SELECT * from pg_replication_slots;
2) Generate some dummy WAL
CHECKPOINT;
SELECT pg_switch_wal();
CHECKPOINT;
SELECT pg_switch_wal();
3) Advance slot to the value of pg_current_wal_insert_lsn()
SELECT pg_replication_slot_advance('slot1', '0/160001A0');
4) Check that restart_lsn has been updated
SELECT * from pg_replication_slots;
5) Restart server and check restart_lsn again. It should be the same as
in the step 1.
I dig into the code and it happens because of this if statement:
/* Update the on disk state when lsn was updated. */
if (XLogRecPtrIsInvalid(endlsn))
{
ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
}
Actually, endlsn is always a valid LSN after the execution of
replication slot advance guts. It works for logical slots only by
chance, since there is an implicit ReplicationSlotMarkDirty() call
inside LogicalConfirmReceivedLocation.
Attached is a small patch, which fixes this bug. I have tried to
stick to the same logic in this 'if (XLogRecPtrIsInvalid(endlsn))'
and now pg_logical_replication_slot_advance and
pg_physical_replication_slot_advance return InvalidXLogRecPtr if
no-op.
What do you think?
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
P.S. CCed Simon and Michael as they are the last who seriously touched
pg_replication_slot_advance code.