Re: BUG #13844: Logical decoding bug with subxact + row locking - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #13844: Logical decoding bug with subxact + row locking
Date
Msg-id 20160207114217.pct2ldvbtdubriy3@alap3.anarazel.de
Whole thread Raw
In response to BUG #13844: Logical decoding bug with subxact + row locking  (jarred@webriots.com)
Responses Re: BUG #13844: Logical decoding bug with subxact + row locking
List pgsql-bugs
Hi,

On 2016-01-05 03:32:49 +0000, jarred@webriots.com wrote:
> This is a pretty serious logical decoding bug that bit us pretty hard
> in production.
>
> Run the following code:
>
>
> ----------------------------------------------------------------------------
>     CREATE TABLE foo(a SERIAL PRIMARY KEY, b TEXT);
>
>     SELECT pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
>
>     INSERT INTO foo (b) VALUES ('john');
>
>     BEGIN;
>     SELECT * FROM foo WHERE b = 'john' FOR UPDATE;
>     SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('jane');
>     ROLLBACK TO SAVEPOINT sp1;
>     INSERT INTO foo (b) VALUES ('doe');
>     END;
>
>     SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
>
> ----------------------------------------------------------------------------
>
> The last select above should return the decoded WAL, but instead it
> returns:
>
>
> ----------------------------------------------------------------------------
>     ERROR:  subxact logged without previous toplevel record
>
> ----------------------------------------------------------------------------

Ugh, that's an annoying bug. Working on producing a fix.

The problem is that logical decoding expects to know about toplevel xids
before subtransaction/savepoint xids. Which is enforced in the WAL
logging code, so that's ok. But the problem is that right now the WAL
produced looks like (abbreviated):
rmgr: Heap        tx:        585, lsn: 0/01A304A8, desc: LOCK off 1: xid 585 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref
#0:rel 1663/12385/24607 blk 0 
rmgr: Heap        tx:        586, lsn: 0/01A30658, desc: INSERT off 2, blkref #0: rel 1663/12385/24607 blk 0 FPW
rmgr: Transaction tx:        586, lsn: 0/01A30798, desc: ABORT 2016-02-07 12:21:18.624045 CET
rmgr: Standby     tx:          0, lsn: 0/01A307C0, desc: RUNNING_XACTS nextXid 587 latestCompletedXid 586
oldestRunningXid585; 1 xacts: 585 
rmgr: Heap        tx:        587, lsn: 0/01A307F8, desc: INSERT off 3, blkref #0: rel 1663/12385/24607 blk 0
rmgr: Btree       tx:        587, lsn: 0/01A30838, desc: INSERT_LEAF off 3, blkref #0: rel 1663/12385/24614 blk 1
rmgr: Transaction tx:        585, lsn: 0/01A30878, desc: COMMIT 2016-02-07 12:21:28.062978 CET; subxacts: 587

and decode.c doesn't care about LOCK records so far. Which means
reorderbuffer.c doesn't know about the relevant xid. Hm.

Greetings,

Andres Freund

pgsql-bugs by date:

Previous
From: Burkhardt.Renz@mni.thm.de
Date:
Subject: BUG #13932: German ß not a valid character in psql
Next
From: John R Pierce
Date:
Subject: Re: BUG #13932: German ß not a valid character in psql