Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Date
Msg-id 20191119232849.qhtuhq4udrqrz4he@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
List pgsql-bugs
Hi,

(Alvaro, see multixact point below)

Are there any partitions?

On 2019-11-19 23:17:32 +0100, Tomas Vondra wrote:
> Yep. thanks for this data. This seems to confirm Andres' suspicion that
> the change is actually a delete. It's not clear to me why we're doing
> this toast thing for DELETEs unconditionally.

Well, there never should be any toast activity in case of deletions. So
there's already is something wrong in that case. I don't think just
ignoring that something is wrong would be good - but raising an error
would make sense.


> For the record, the change LSN is 25EE/D6DE33B0 and the transactions is
> between 25EE/D66F0438 and 25EE/D6DE6F00.


> > 2514:rmgr: Transaction len (rec/tot):     38/    38, tx:          0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc:
ASSIGNMENTxtop 1667601527: subxacts: 1667601528
 

So there's subtransactions in this workload...


> > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc:
INSERToff 4, blkref #0: rel 1663/13018/88964795 blk 3125
 
> > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc:
INSERT_LEAFoff 256, blkref #0: rel 1663/13018/88964797 blk 15840
 
> > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc:
INSERToff 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
 
> > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc:
INSERT_LEAFoff 257, blkref #0: rel 1663/13018/88964797 blk 15840
 
> > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:
INSERToff 2, blkref #0: rel 1663/13018/88964792 blk 18475
 
> > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc:
INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
> > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc:
INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
> > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc:
INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
> > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:
DELETEoff 2 , blkref #0: rel 1663/13018/88964792 blk 18475
 
> > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc:
DELETEoff 4 , blkref #0: rel 1663/13018/88964795 blk 3125
 
> > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc:
DELETEoff 7 , blkref #0: rel 1663/13018/88964795 blk 3139
 
> > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc:
DELETEoff 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
 

> > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc:
LOCKoff 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
 
> > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc:
UPDATEoff 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel
1663/13018/88964792blk 18460
 
> > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc:
INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
> > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc:
INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
> > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc:
INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
> > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc:
COMMIT2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
 


> So, this DELETE change triggers the issue. If I understand it correctly,
> that means the transaction locked the tuple and then deleted it. But I
> don't see any record about the lock.

Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when

    if (old_infomask & HEAP_XMAX_INVALID)

is true. Which is the most common case. I actually find it *more*
curious to *not* see that for a delete. I've not traced fully through
it, yet.


Alvaro, it's not clear to me whether the the multixact codepaths in
compute_new_xmax_infomask() actually work as expected for deletes.


> But this output seems a bit incomplete - there's an assignment record
> for subxact 1667601528, without any records. And there's no assignment
> for 1667601530. Can you simply post the whole WAL output for the
> transaction LSN range, i.e. something like
> 
>   pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 ...

Yea, that'd be quite helpful.


Which relations do the relfilenodes correspond to? Most importantly
1663/13018/88964815, 1663/13018/88964795, 1663/13018/88964792

You can look up the mapping, if there's not been any rewriting DDL
since, with e.g.

SELECT r, r::regclass FROM pg_filenode_relation(13018, 88964815) r;

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Next
From: Tom Lane
Date:
Subject: Re: No = operator for opfamily 426