Thread: sentPtr jumping back at the beginning of logical replication

sentPtr jumping back at the beginning of logical replication

From
Ashutosh Bapat
Date:
Hi All,
sentPtr reported by WAL sender should usually never jump back, it
should always increase.
I observed a strange behaviour with the WAL sender where sentPtr jumps
back at the beginning. From code examination it looks like the
following behaviour is culprit.

The WAL sender reads WAL from restart_lsn which is what is set in
reader->EndRecPtr in XLogBeginRead. So reader->EndRecPtr starts with
restart_lsn

sentPtr starts with MyReplicationSlot->data.confirmed_flush in
StartLogicalReplication(). Usually there will be some or other
concurrent transaction happening, so confirmed_flush is higher than
restart_lsn. After the first loop over send_data in WalSndLoop(), it
gets set to reader->EndRecPtr. So when the first WAL record is read it
jumps back to the end of the first record starting at restart_lsn.
Eventually it will catch up to confirmed_lsn when the WAL sender reads
WAL.

This seems to be harmless but the logical receiver may get confused if
it receives an LSN lesser than confirmed_flush.

-- 
Best Wishes,
Ashutosh Bapat