Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication - Mailing list pgsql-bugs

From Andrey Salnikov
Subject Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Date
Msg-id CB56B75F-C74C-49E3-87D9-9A54650BA0CA@dataegret.com
Whole thread Raw
In response to Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-bugs
Hi,

> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.
>
> You may need to install debug symbols first, BTW.

I hope that I've understood how to extract correctly those data

(gdb) frame 2
#2  ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720,
end_lsn=end_lsn@entry=41707737345792,commit_time=commit_time@entry=627188499326201,  
    origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585
1585    ./build/../src/backend/replication/logical/reorderbuffer.c: No such file or directory.
(gdb) print change
$1 = (ReorderBufferChange *) 0x555713b9a108
(gdb) print *change
$2 = {lsn = 41707737330608, action = REORDER_BUFFER_CHANGE_DELETE, origin_id = 0, data = {tp = {relnode = {spcNode =
1663,dbNode = 13018, relNode = 88964815}, clear_toast_afterwards = 1 '\001', oldtuple = 0x555713cd0d88,  
      newtuple = 0x0}, msg = {prefix = 0x32da0000067f <error: Cannot access memory at address 0x32da0000067f>,
message_size= 4383932111, message = 0x555713cd0d88 "\330-\315\023WU"}, snapshot = 0x32da0000067f,  
    command_id = 1663, tuplecid = {node = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, tid = {ip_blkid =
{bi_hi= 1, bi_lo = 0}, ip_posid = 3464}, cmin = 21847, cmax = 0, combocid = 0}}, node = { 
    prev = 0x555713b98178, next = 0x555713b98178}}
(gdb) print txn
$3 = (ReorderBufferTXN *) 0x555713b982e8
(gdb) print *txn
$6 = {xid = 1667601527, has_catalog_changes = 0 '\000', is_known_as_subxact = 0 '\000', toplevel_xid = 0, first_lsn =
41707730043960,final_lsn = 41707737345720, end_lsn = 41707737345792, restart_decoding_lsn = 0,  
  origin_id = 0, origin_lsn = 0, commit_time = 627188499326201, base_snapshot = 0x555713b90408, base_snapshot_lsn =
41707730047464,base_snapshot_node = {prev = 0x555713b7b7a0, next = 0x555713b7b7a0}, nentries = 0,  
  nentries_mem = 0, serialized = 0 '\000', changes = {head = {prev = 0x555713b98368, next = 0x555713b98368}}, tuplecids
={head = {prev = 0x555713b98378, next = 0x555713b98378}}, ntuplecids = 0, tuplecid_hash = 0x0,  
  toast_hash = 0x555714ad0fb8, subtxns = {head = {prev = 0x555713b985b0, next = 0x555713b984b8}}, nsubtxns = 4,
ninvalidations= 0, invalidations = 0x0, node = {prev = 0x555713b7b790, next = 0x555713b7b790}} 


> Do you see it on different PostgreSQL instances? Can you share the
> schema definition?

Unfortunately I can't do it completely, but there are
                   List of installed extensions
   Name    | Version |   Schema   |          Description
-----------+---------+------------+--------------------------------
 pgcrypto  | 1.3     | public     | cryptographic functions
 pglogical | 2.2.1   | pglogical  | PostgreSQL Logical Replication
 plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language

And each table in publication has next structure:
1 jsonb field, and few smallint, integer, bigint fields. No one table has any triggers or special constraints, only
primarykey and one or two btree indexes. 


> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.

I will try to do it.


> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.

Hm, it looks like I do not have wal with start of transaction, but I've found next records using txid and subxacts. I
hopeit will be useful. 

2514:rmgr: Transaction len (rec/tot):     38/    38, tx:          0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc:
ASSIGNMENTxtop 1667601527: subxacts: 1667601528 
2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc:
INSERToff 4, blkref #0: rel 1663/13018/88964795 blk 3125 
2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc:
INSERT_LEAFoff 256, blkref #0: rel 1663/13018/88964797 blk 15840 
2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc:
INSERToff 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW 
2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc:
INSERT_LEAFoff 257, blkref #0: rel 1663/13018/88964797 blk 15840 
2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:
INSERToff 2, blkref #0: rel 1663/13018/88964792 blk 18475 
2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc:
INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 
2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc:
INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 
2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc:
INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 
2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:
DELETEoff 2 , blkref #0: rel 1663/13018/88964792 blk 18475 
2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc:
DELETEoff 4 , blkref #0: rel 1663/13018/88964795 blk 3125 
2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc:
DELETEoff 7 , blkref #0: rel 1663/13018/88964795 blk 3139 
56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc:
DELETEoff 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW 
56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc:
LOCKoff 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 
56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc:
UPDATEoff 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel
1663/13018/88964792blk 18460 
56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc:
INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 
56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc:
INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 
56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc:
INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 
56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc:
COMMIT2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 

Regards,
Andrey.


pgsql-bugs by date:

Previous
From: Gmail
Date:
Subject: Re: BUG #16119: pg_dump omits columns specification for matviews
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication