Hi,
while working on logical decoding of sequences, I ran into an issue with
nextval() in a transaction that rolls back, described in [1]. But after
thinking about it a bit more (and chatting with Petr Jelinek), I think
this issue affects physical sync replication too.
Imagine you have a primary <-> sync_replica cluster, and you do this:
CREATE SEQUENCE s;
-- shutdown the sync replica
BEGIN;
SELECT nextval('s') FROM generate_series(1,50);
ROLLBACK;
BEGIN;
SELECT nextval('s');
COMMIT;
The natural expectation would be the COMMIT gets stuck, waiting for the
sync replica (which is not running), right? But it does not.
The problem is exactly the same as in [1] - the aborted transaction
generated WAL, but RecordTransactionAbort() ignores that and does not
update LogwrtResult.Write, with the reasoning that aborted transactions
do not matter. But sequences violate that, because we only write WAL
once every 32 increments, so the following nextval() gets "committed"
without waiting for the replica (because it did not produce WAL).
I'm not sure this is a clear data corruption bug, but it surely walks
and quacks like one. My proposal is to fix this by tracking the lsn of
the last LSN for a sequence increment, and then check that LSN in
RecordTransactionCommit() before calling XLogFlush().
regards
[1]
https://www.postgresql.org/message-id/ae3cab67-c31e-b527-dd73-08f196999ad4%40enterprisedb.com
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company