Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Date
Msg-id 20191126182752.byojog7xws2or73x@development
Whole thread Raw
In response to BUG #16125: Crash of PostgreSQL's wal sender during logical replication  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
List pgsql-bugs
On Tue, Nov 26, 2019 at 02:10:54PM -0300, Alvaro Herrera wrote:
>On 2019-Nov-19, Andres Freund wrote:
>
>> > > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0,
desc:INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
 
>> > > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8,
desc:INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
 
>> > > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78,
desc:INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
 
>> > > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0,
desc:INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
 
>> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8,
desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
 
>> > > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90,
desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
>> > > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10,
desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
>> > > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40,
desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
>> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0,
desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
 
>> > > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0,
desc:DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
 
>> > > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28,
desc:DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
 
>> > > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628,
desc:DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
 
>>
>> > > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0,
desc:LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
 
>> > > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0,
desc:UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel
1663/13018/88964792blk 18460
 
>> > > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408,
desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
>> > > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8,
desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
>> > > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08,
desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
>> > > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68,
desc:COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
 
>>
>> > So, this DELETE change triggers the issue. If I understand it correctly,
>> > that means the transaction locked the tuple and then deleted it. But I
>> > don't see any record about the lock.
>>
>> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
>> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>>
>>     if (old_infomask & HEAP_XMAX_INVALID)
>>
>> is true. Which is the most common case.
>
>I don't understand what you guys are saying.  There's a DELETE at
>25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
>Right?  There's also a LOCK+UPDATE sequence just below that, which is
>unrelated (different rel).  Are you saying that one of them is toast and
>the other is its main heap?  It would be useful to know more about the
>rels.
>

I was wondering why there's no LOCK before the DELETE, when the waldump
output has KEYS_UPDATED for that change. But as Andres says, it's likely
normal because e.g. heap_delete sets flag in certain cases.

I agree it would be useful to know more about the rels. AFAIK 88964815
is the heap, and my guess is the other rels are various indexes on it.
I've already said so in my message from last week.

>> I actually find it *more*
>> curious to *not* see that for a delete. I've not traced fully through
>> it, yet.
>> Alvaro, it's not clear to me whether the the multixact codepaths in
>> compute_new_xmax_infomask() actually work as expected for deletes.
>
>Are you saying that the three *prior* deletes (lines 2532-2534) are
>strange because they lack the KEYS_UPDATED bit?  (Note these are all for
>different rels).
>
>I'm not sure what to do about this report at this point.
>Notwithstanding my general failure to focus on anything too complex
>given my surrounding social context.
>

I think having more information about the structure (tables, indexes,
mapping for relfilenodes) and a better idea what the transaction is
doing, would be helpful.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Failed assertion clauses != NIL
Next
From: Abhilash Mishra
Date:
Subject: Re: BUG #16138: Error while installing PostgreSQL