Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication |
Date | |
Msg-id | 20191127223933.ohe3wtdsssvgxdh5@development Whole thread Raw |
In response to | Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication |
List | pgsql-bugs |
On Wed, Nov 27, 2019 at 02:59:35PM -0300, Alvaro Herrera wrote: >On 2019-Nov-19, Andres Freund wrote: > >> > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 >> > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > >> > So, this DELETE change triggers the issue. If I understand it correctly, >> > that means the transaction locked the tuple and then deleted it. But I >> > don't see any record about the lock. >> >> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when >> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when >> >> if (old_infomask & HEAP_XMAX_INVALID) >> >> is true. Which is the most common case. I actually find it *more* >> curious to *not* see that for a delete. I've not traced fully through >> it, yet. >> >> >> Alvaro, it's not clear to me whether the the multixact codepaths in >> compute_new_xmax_infomask() actually work as expected for deletes. > >So, what's really happening here is heap_insert(speculative) followed by >heap_abort_speculative(). That seems to explain the above WAL sequence. >I *think* the speculative insertion code in reorderbuffer simply delays >the tuple insertion until it is confirmed; so this sequence would result >in nothing at all happening at ReorderBufferCommit. Unless something >has messed up the state somehow. > Hmm, so how would that transaction look like? My reading of the WAL (based on the available relfilenode info) is this: Transaction 25EE/D66F0438 ASSIGNMENT xtop 1667601527: subxacts: 1667601528 -> seems like a savepoint, probably Heap 25EE/D66F11E8 INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125 Heap 25EE/D66F2118 INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW Btree 25EE/D66F54C8 INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840 Heap 25EE/D66F5508 INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 Btree 25EE/D66F6738 INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 Btree 25EE/D66F7E40 INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 Btree 25EE/D66F7EA0 INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 -> insert into TOAST table (88964795) + TOAST idx (probably 88964797) -> then an insert into table2 (88964792), and it's indexes Heap 25EE/D66F7EF0 DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 Heap 25EE/D66F7F28 DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125 Heap 25EE/D66F7F60 DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139 -> now we delete the record from table2 and also it's TOAST table (how do we trigger that?) Heap 25EE/D6DE33B0 DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW -> delete from table1 (88964815), which we haven't touched in this xact until now (this is the crash LSN) Heap 25EE/D6DE53D0 LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792blk 18460 Heap 25EE/D6DE5408 UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499,blkref #1: rel 1663/13018/88964792 blk 18460 Btree 25EE/D6DE6DC8 INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 Btree 25EE/D6DE6E08 INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 Btree 25EE/D6DE6E68 INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 Transaction 25EE/D6DE6EB8 COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 -> seems like a lock of a record in table2, along with update of indexes I can more or less simulate this by begin; savepoint s; insert into table2 values (1, ... jsonb ..., NULL, 1, 1, 'x', 1, 1, 1, 1, 1, 1) on conflict (column1) do update set column14 = 1; delete from table2 where column1 = 1; update table2 set column14 = 2 where column1 = 1; commit; Which however does not explain the three deletes from table2 + TOAST. But, maybe that's not the issue, because the decoding fails on the next record, which is for table1. FWIW attached is a SQL script creating tables with the same structure (without the dropped columns, but I think those are irrelevant). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: