Hi,
On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
> This is a pretty serious logical decoding bug that bit us pretty hard
> in production.
>
> Run the following code:
>
>
> ----------------------------------------------------------------------------
> CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
>
> SELECT pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
>
> INSERT INTO foo (b) VALUES ('john');
>
> BEGIN;
> SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
> SAVEPOINT sp1;
> INSERT INTO foo (b) VALUES ('jane');
> ROLLBACK TO SAVEPOINT sp1;
> INSERT INTO foo (b) VALUES ('doe');
> END;
>
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
>
> ----------------------------------------------------------------------------
>
> The last select above should return the decoded WAL, but instead it
> returns:
>
>
> ----------------------------------------------------------------------------
> ERROR: subxact logged without previous toplevel record
>
> ----------------------------------------------------------------------------
Ugh, that's an annoying bug. Working on producing a fix.
The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap tx: 585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref
#0:rel 1663/12385/24607 blk 0
rmgr: Heap tx: 586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx: 586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby tx: 0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586
oldestRunningXid585; 1 xacts: 585
rmgr: Heap tx: 587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree tx: 587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx: 585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587
and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.
Greetings,
Andres Freund