Thread: BUG #16125: Crash of PostgreSQL's wal sender during logical replication
BUG #16125: Crash of PostgreSQL's wal sender during logical replication
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16125 Logged by: Andrey Salnikov Email address: andrey.salnikov@dataegret.com PostgreSQL version: 10.11 Operating system: Ubuntu 18.04.2 LTS Description: Hi, PostgreSQL's wal sender begins continuously crash during logical replication. Each start after crash sender tries to decode the same part of wal and crash again and again. Log contains next lines: 2019-11-16 03:01:39.512 UTC 22235 @ from [vxid: txid:0] [] LOG: server process (PID 26972) was terminated by signal 11: Segmentation fault -- 2019-11-16 03:03:16.240 UTC 8286 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 675790 buffers (32.2%); 0 WAL file(s) added, 198 removed, 0 recycled; write=68.768 s, sync=0.011 s, total=69.214 s; sync files=361, longest=0.002 s, average=0.000 s; distance=3164435 kB, estimate=3164435 kB 2019-11-16 03:03:16.251 UTC 22235 @ from [vxid: txid:0] [] LOG: database system is ready to accept connections 2019-11-16 03:03:16.533 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] LOG: received replication command: IDENTIFY_SYSTEM 2019-11-16 03:03:16.533 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] LOG: received replication command: START_REPLICATION SLOT "rep_slot" LOGICAL 25EE/D6DD72D0 (proto_version '1', publication_names '"rep_pub"') 2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] LOG: starting logical decoding for slot "rep_slot" 2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] DETAIL: streaming transactions committing after 25EE/D614FC78, reading WAL from 25EE/D614B580 2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] LOG: logical decoding found consistent point at 25EE/D614B580 2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0] [idle] DETAIL: There are no running transactions. 2019-11-16 03:03:16.732 UTC 22235 @ from [vxid: txid:0] [] LOG: server process (PID 10070) was terminated by signal 11: Segmentation fault -- 2019-11-16 03:03:18.410 UTC 10072 @ from [vxid: txid:0] [] LOG: database system was not properly shut down; automatic recovery in progress 2019-11-16 03:03:18.413 UTC 10072 @ from [vxid: txid:0] [] LOG: invalid record length at 25EE/D6E155C8: wanted 24, got 0 2019-11-16 03:03:18.413 UTC 10072 @ from [vxid: txid:0] [] LOG: redo is not required 2019-11-16 03:03:18.419 UTC 10072 @ from [vxid: txid:0] [] LOG: checkpoint starting: end-of-recovery immediate 2019-11-16 03:03:18.467 UTC 10072 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.028 s, sync=0.000 s, total=0.050 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB 2019-11-16 03:03:18.476 UTC 22235 @ from [vxid: txid:0] [] LOG: database system is ready to accept connections 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] LOG: received replication command: IDENTIFY_SYSTEM 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] LOG: received replication command: START_REPLICATION SLOT "rep_slot" LOGICAL 25EE/D6DD72D0 (proto_version '1', publication_names '"rep_pub"') 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] LOG: starting logical decoding for slot "rep_slot" 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] DETAIL: streaming transactions committing after 25EE/D614FC78, reading WAL from 25EE/D614B580 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] LOG: logical decoding found consistent point at 25EE/D614B580 2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0] [idle] DETAIL: There are no running transactions. 2019-11-16 03:03:18.949 UTC 22235 @ from [vxid: txid:0] [] LOG: server process (PID 10112) was terminated by signal 11: Segmentation fault Core dump contains: Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/92/b3b790a5135245f2cbb2a0a2f48fc7ff9f689a.debug...done. done. [New LWP 13678] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: 10/main: wal sender process replication 1.1.1.1(43224) idle '. Program terminated with signal SIGSEGV, Segmentation fault. #0 heap_deform_tuple (tuple=tuple@entry=0x8, tupleDesc=tupleDesc@entry=0x555713b2e638, values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "") at ./build/../src/backend/access/common/heaptuple.c:936 936 ./build/../src/backend/access/common/heaptuple.c: No such file or directory. (gdb) bt #0 heap_deform_tuple (tuple=tuple@entry=0x8, tupleDesc=tupleDesc@entry=0x555713b2e638, values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "") at ./build/../src/backend/access/common/heaptuple.c:936 #1 0x00005557127c918b in ReorderBufferToastReplace (rb=0x555713b7b788, txn=0x555713b982e8, relation=0x7f4b3fafa248, relation=0x7f4b3fafa248, change=0x555713b9a108) at ./build/../src/backend/replication/logical/reorderbuffer.c:2985 #2 ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720, end_lsn=end_lsn@entry=41707737345792, commit_time=commit_time@entry=627188499326201, origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585 #3 0x00005557127bef32 in DecodeCommit (xid=1667601527, parsed=0x7ffd288fe4a0, buf=<synthetic pointer>, ctx=0x555713a9ca08) at ./build/../src/backend/replication/logical/decode.c:614 #4 DecodeXactOp (buf=<synthetic pointer>, ctx=0x555713a9ca08) at ./build/../src/backend/replication/logical/decode.c:244 #5 LogicalDecodingProcessRecord (ctx=0x555713a9ca08, record=<optimized out>) at ./build/../src/backend/replication/logical/decode.c:113 #6 0x00005557127d0b86 in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2797 #7 0x00005557127d232f in WalSndLoop (send_data=send_data@entry=0x5557127d0b10 <XLogSendLogical>) at ./build/../src/backend/replication/walsender.c:2157 #8 0x00005557127d30aa in StartLogicalReplication (cmd=0x555713b499d8) at ./build/../src/backend/replication/walsender.c:1105 #9 exec_replication_command (cmd_string=cmd_string@entry=0x555713b4da38 "START_REPLICATION SLOT \"pgprod_office_basic\" LOGICAL 25EE/D6DD72D0 (proto_version '1', publication_names '\"pgprod_master_basic\"')") at ./build/../src/backend/replication/walsender.c:1536 #10 0x000055571281de3a in PostgresMain (argc=<optimized out>, argv=argv@entry=0x555713add8d8, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4113 #11 0x000055571255c72c in BackendRun (port=0x555713ad56d0) at ./build/../src/backend/postmaster/postmaster.c:4408 #12 BackendStartup (port=0x555713ad56d0) at ./build/../src/backend/postmaster/postmaster.c:4080 #13 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1756 #14 0x00005557127ada21 in PostmasterMain (argc=9, argv=0x555713a7d190) at ./build/../src/backend/postmaster/postmaster.c:1364 #15 0x000055571255e422 in main (argc=9, argv=0x555713a7d190) at ./build/../src/backend/main/main.c:228 (gdb) quit This behaviour does not depends on defined data in tables, because we see it in different database with different sets of tables in publications. Looks like a real issue in logical replication. I will happy to provide an additional information about that issue, but i should know what else to need to collect for helping to solve this problem. Regards, Andrey.
On Mon, Nov 18, 2019 at 06:41:43PM +0000, PG Bug reporting form wrote: >The following bug has been logged on the website: > >Bug reference: 16125 >Logged by: Andrey Salnikov >Email address: andrey.salnikov@dataegret.com >PostgreSQL version: 10.11 >Operating system: Ubuntu 18.04.2 LTS >Description: > >Hi, >PostgreSQL's wal sender begins continuously crash during logical >replication. Each start after crash sender tries to decode the same part of >wal and crash again and again. > Thanks for the report. Unfortunately this does not tell us much, except that it crashed :-( >Log contains next lines: >2019-11-16 03:01:39.512 UTC 22235 @ from [vxid: txid:0] [] LOG: server >process (PID 26972) was terminated by signal 11: Segmentation fault >-- > ... > >[idle] DETAIL: There are no running transactions. >2019-11-16 03:03:18.949 UTC 22235 @ from [vxid: txid:0] [] LOG: server >process (PID 10112) was terminated by signal 11: Segmentation fault > This is more interesting: >Core dump contains: >Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols >from >/usr/lib/debug/.build-id/92/b3b790a5135245f2cbb2a0a2f48fc7ff9f689a.debug...done. >done. >[New LWP 13678] >[Thread debugging using libthread_db enabled] >Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". >Core was generated by `postgres: 10/main: wal sender process replication >1.1.1.1(43224) idle '. >Program terminated with signal SIGSEGV, Segmentation fault. >#0 heap_deform_tuple (tuple=tuple@entry=0x8, >tupleDesc=tupleDesc@entry=0x555713b2e638, >values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "") >at ./build/../src/backend/access/common/heaptuple.c:936 >936 ./build/../src/backend/access/common/heaptuple.c: No such file or >directory. >(gdb) bt >#0 heap_deform_tuple (tuple=tuple@entry=0x8, >tupleDesc=tupleDesc@entry=0x555713b2e638, >values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "") >at ./build/../src/backend/access/common/heaptuple.c:936 >#1 0x00005557127c918b in ReorderBufferToastReplace (rb=0x555713b7b788, >txn=0x555713b982e8, relation=0x7f4b3fafa248, relation=0x7f4b3fafa248, >change=0x555713b9a108) > at ./build/../src/backend/replication/logical/reorderbuffer.c:2985 >#2 ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, >commit_lsn=commit_lsn@entry=41707737345720, >end_lsn=end_lsn@entry=41707737345792, >commit_time=commit_time@entry=627188499326201, > origin_id=origin_id@entry=0, origin_lsn=0) at >./build/../src/backend/replication/logical/reorderbuffer.c:1585 OK, so the issue seems to be related to handling TOASTed values, i.e. we do this: /* user-triggered change */ if (!IsToastRelation(relation)) { ReorderBufferToastReplace(rb, txn, relation, change); rb->apply_change(rb, txn, relation, change); /* * Only clear reassembled toast chunks if we're sure * they're not required anymore. The creator of the * tuple tells us. */ if (change->data.tp.clear_toast_afterwards) ReorderBufferToastReset(rb, txn); } and the ReorderBufferToastReplace does this: newtup = change->data.tp.newtuple; heap_deform_tuple(&newtup->tuple, desc, attrs, isnull); but that fails, because the tuple pointer happens to be 0x8, which is clearly bogus. Not sure where that comes from, I don't recognize that as a typical patter. Can you create a core dump (see [1]), and print 'change' and 'txn' in frame #2? I wonder if some the other fields are bogus too (but it can't be entirely true ...), and if the transaction got serialized. You may need to install debug symbols first, BTW. > >This behaviour does not depends on defined data in tables, because we see it >in different database with different sets of tables in publications. I'm not sure I really believe that. Surely there has to be something special about your schema, or possibly access patter that triggers this bug in your environment and not elsewhere. Do you see it on different PostgreSQL instances? Can you share the schema definition? >Looks like a real issue in logical replication. >I will happy to provide an additional information about that issue, but i >should know what else to need to collect for helping to solve this >problem. > Well, if you can create a reproducer, that'd be the best option, because then we can investigate locally instead of the ping-ping here. But if that's not possible, let's start with the schema and the additional information from the core file. I'd also like to see the contents of the WAL, particularly for the XID triggering this issue. Please run pg_waldump and see how much data is there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure where it starts. It may have subtransactions, so don't do just grep. regards [1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote: > and the ReorderBufferToastReplace does this: > > newtup = change->data.tp.newtuple; > > heap_deform_tuple(&newtup->tuple, desc, attrs, isnull); > > but that fails, because the tuple pointer happens to be 0x8, which is > clearly bogus. Not sure where that comes from, I don't recognize that as > a typical patter. It indicates that change->data.tp.newtuple is NULL, afaict. newtup->tuple boils down to ((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple) and offsetof(ReorderBufferTupleBuf, tuple) is 0x8. > Can you create a core dump (see [1]), and print 'change' and 'txn' in > frame #2? I wonder if some the other fields are bogus too (but it can't > be entirely true ...), and if the transaction got serialized. Please print change and *change, both, please. I suspect what's happening is that somehow a change that shouldn't have toast changes - e.g. a DELETE - somehow has toast changes. Which then triggers a failure in ReorderBufferToastReplace(), which expects newtuple to be valid. It's probably worthwhile to add an elog(ERROR) check for this, even if this does not turn out to be the case. > > This behaviour does not depends on defined data in tables, because we see it > > in different database with different sets of tables in publications. > > I'm not sure I really believe that. Surely there has to be something > special about your schema, or possibly access patter that triggers this > bug in your environment and not elsewhere. Yea. Are there any C triggers present? Any unusual extensions? Users of the transaction hook, for example? > > Looks like a real issue in logical replication. > > I will happy to provide an additional information about that issue, but i > > should know what else to need to collect for helping to solve this > > problem. > > > > Well, if you can create a reproducer, that'd be the best option, because > then we can investigate locally instead of the ping-ping here. > > But if that's not possible, let's start with the schema and the > additional information from the core file. > > I'd also like to see the contents of the WAL, particularly for the XID > triggering this issue. Please run pg_waldump and see how much data is > there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure > where it starts. It may have subtransactions, so don't do just grep. Yea, that'd be helpful. Greetings, Andres Freund
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Andrey Salnikov
Date:
Hi, > Can you create a core dump (see [1]), and print 'change' and 'txn' in > frame #2? I wonder if some the other fields are bogus too (but it can't > be entirely true ...), and if the transaction got serialized. > > You may need to install debug symbols first, BTW. I hope that I've understood how to extract correctly those data (gdb) frame 2 #2 ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720, end_lsn=end_lsn@entry=41707737345792,commit_time=commit_time@entry=627188499326201, origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585 1585 ./build/../src/backend/replication/logical/reorderbuffer.c: No such file or directory. (gdb) print change $1 = (ReorderBufferChange *) 0x555713b9a108 (gdb) print *change $2 = {lsn = 41707737330608, action = REORDER_BUFFER_CHANGE_DELETE, origin_id = 0, data = {tp = {relnode = {spcNode = 1663,dbNode = 13018, relNode = 88964815}, clear_toast_afterwards = 1 '\001', oldtuple = 0x555713cd0d88, newtuple = 0x0}, msg = {prefix = 0x32da0000067f <error: Cannot access memory at address 0x32da0000067f>, message_size= 4383932111, message = 0x555713cd0d88 "\330-\315\023WU"}, snapshot = 0x32da0000067f, command_id = 1663, tuplecid = {node = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, tid = {ip_blkid = {bi_hi= 1, bi_lo = 0}, ip_posid = 3464}, cmin = 21847, cmax = 0, combocid = 0}}, node = { prev = 0x555713b98178, next = 0x555713b98178}} (gdb) print txn $3 = (ReorderBufferTXN *) 0x555713b982e8 (gdb) print *txn $6 = {xid = 1667601527, has_catalog_changes = 0 '\000', is_known_as_subxact = 0 '\000', toplevel_xid = 0, first_lsn = 41707730043960,final_lsn = 41707737345720, end_lsn = 41707737345792, restart_decoding_lsn = 0, origin_id = 0, origin_lsn = 0, commit_time = 627188499326201, base_snapshot = 0x555713b90408, base_snapshot_lsn = 41707730047464,base_snapshot_node = {prev = 0x555713b7b7a0, next = 0x555713b7b7a0}, nentries = 0, nentries_mem = 0, serialized = 0 '\000', changes = {head = {prev = 0x555713b98368, next = 0x555713b98368}}, tuplecids ={head = {prev = 0x555713b98378, next = 0x555713b98378}}, ntuplecids = 0, tuplecid_hash = 0x0, toast_hash = 0x555714ad0fb8, subtxns = {head = {prev = 0x555713b985b0, next = 0x555713b984b8}}, nsubtxns = 4, ninvalidations= 0, invalidations = 0x0, node = {prev = 0x555713b7b790, next = 0x555713b7b790}} > Do you see it on different PostgreSQL instances? Can you share the > schema definition? Unfortunately I can't do it completely, but there are List of installed extensions Name | Version | Schema | Description -----------+---------+------------+-------------------------------- pgcrypto | 1.3 | public | cryptographic functions pglogical | 2.2.1 | pglogical | PostgreSQL Logical Replication plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language And each table in publication has next structure: 1 jsonb field, and few smallint, integer, bigint fields. No one table has any triggers or special constraints, only primarykey and one or two btree indexes. > Well, if you can create a reproducer, that'd be the best option, because > then we can investigate locally instead of the ping-ping here. I will try to do it. > I'd also like to see the contents of the WAL, particularly for the XID > triggering this issue. Please run pg_waldump and see how much data is > there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure > where it starts. It may have subtransactions, so don't do just grep. Hm, it looks like I do not have wal with start of transaction, but I've found next records using txid and subxacts. I hopeit will be useful. 2514:rmgr: Transaction len (rec/tot): 38/ 38, tx: 0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENTxtop 1667601527: subxacts: 1667601528 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERToff 4, blkref #0: rel 1663/13018/88964795 blk 3125 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAFoff 256, blkref #0: rel 1663/13018/88964797 blk 15840 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERToff 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAFoff 257, blkref #0: rel 1663/13018/88964797 blk 15840 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERToff 2, blkref #0: rel 1663/13018/88964792 blk 18475 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETEoff 2 , blkref #0: rel 1663/13018/88964792 blk 18475 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETEoff 4 , blkref #0: rel 1663/13018/88964795 blk 3125 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETEoff 7 , blkref #0: rel 1663/13018/88964795 blk 3139 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETEoff 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCKoff 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATEoff 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792blk 18460 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 Regards, Andrey.
On Wed, Nov 20, 2019 at 12:46:44AM +0300, Andrey Salnikov wrote: >Hi, > >> Can you create a core dump (see [1]), and print 'change' and 'txn' in >> frame #2? I wonder if some the other fields are bogus too (but it can't >> be entirely true ...), and if the transaction got serialized. >> >> You may need to install debug symbols first, BTW. > >I hope that I've understood how to extract correctly those data > >(gdb) frame 2 >#2 ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720, end_lsn=end_lsn@entry=41707737345792,commit_time=commit_time@entry=627188499326201, > origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585 >1585 ./build/../src/backend/replication/logical/reorderbuffer.c: No such file or directory. >(gdb) print change >$1 = (ReorderBufferChange *) 0x555713b9a108 >(gdb) print *change >$2 = {lsn = 41707737330608, action = REORDER_BUFFER_CHANGE_DELETE, origin_id = 0, data = {tp = {relnode = {spcNode = 1663,dbNode = 13018, relNode = 88964815}, clear_toast_afterwards = 1 '\001', oldtuple = 0x555713cd0d88, > newtuple = 0x0}, msg = {prefix = 0x32da0000067f <error: Cannot access memory at address 0x32da0000067f>, message_size= 4383932111, message = 0x555713cd0d88 "\330-\315\023WU"}, snapshot = 0x32da0000067f, > command_id = 1663, tuplecid = {node = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, tid = {ip_blkid = {bi_hi= 1, bi_lo = 0}, ip_posid = 3464}, cmin = 21847, cmax = 0, combocid = 0}}, node = { > prev = 0x555713b98178, next = 0x555713b98178}} >(gdb) print txn >$3 = (ReorderBufferTXN *) 0x555713b982e8 >(gdb) print *txn >$6 = {xid = 1667601527, has_catalog_changes = 0 '\000', is_known_as_subxact = 0 '\000', toplevel_xid = 0, first_lsn = 41707730043960,final_lsn = 41707737345720, end_lsn = 41707737345792, restart_decoding_lsn = 0, > origin_id = 0, origin_lsn = 0, commit_time = 627188499326201, base_snapshot = 0x555713b90408, base_snapshot_lsn = 41707730047464,base_snapshot_node = {prev = 0x555713b7b7a0, next = 0x555713b7b7a0}, nentries = 0, > nentries_mem = 0, serialized = 0 '\000', changes = {head = {prev = 0x555713b98368, next = 0x555713b98368}}, tuplecids= {head = {prev = 0x555713b98378, next = 0x555713b98378}}, ntuplecids = 0, tuplecid_hash = 0x0, > toast_hash = 0x555714ad0fb8, subtxns = {head = {prev = 0x555713b985b0, next = 0x555713b984b8}}, nsubtxns = 4, ninvalidations= 0, invalidations = 0x0, node = {prev = 0x555713b7b790, next = 0x555713b7b790}} > Yep. thanks for this data. This seems to confirm Andres' suspicion that the change is actually a delete. It's not clear to me why we're doing this toast thing for DELETEs unconditionally. For the record, the change LSN is 25EE/D6DE33B0 and the transactions is between 25EE/D66F0438 and 25EE/D6DE6F00. > >> Do you see it on different PostgreSQL instances? Can you share the >> schema definition? > >Unfortunately I can't do it completely, but there are > List of installed extensions > Name | Version | Schema | Description >-----------+---------+------------+-------------------------------- > pgcrypto | 1.3 | public | cryptographic functions > pglogical | 2.2.1 | pglogical | PostgreSQL Logical Replication > plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language > >And each table in publication has next structure: >1 jsonb field, and few smallint, integer, bigint fields. No one table has any triggers or special constraints, only primarykey and one or two btree indexes. > I think it'd be interesting to actually see the table structure - column types, primary key / replica identity. > >> Well, if you can create a reproducer, that'd be the best option, because >> then we can investigate locally instead of the ping-ping here. > >I will try to do it. > OK. > >> I'd also like to see the contents of the WAL, particularly for the XID >> triggering this issue. Please run pg_waldump and see how much data is >> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure >> where it starts. It may have subtransactions, so don't do just grep. > >Hm, it looks like I do not have wal with start of transaction, but I've found next records using txid and subxacts. I hopeit will be useful. > >2514:rmgr: Transaction len (rec/tot): 38/ 38, tx: 0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENTxtop 1667601527: subxacts: 1667601528 >2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERToff 4, blkref #0: rel 1663/13018/88964795 blk 3125 >2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAFoff 256, blkref #0: rel 1663/13018/88964797 blk 15840 >2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERToff 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW >2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAFoff 257, blkref #0: rel 1663/13018/88964797 blk 15840 >2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERToff 2, blkref #0: rel 1663/13018/88964792 blk 18475 >2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 >2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 >2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 >2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETEoff 2 , blkref #0: rel 1663/13018/88964792 blk 18475 >2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETEoff 4 , blkref #0: rel 1663/13018/88964795 blk 3125 >2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETEoff 7 , blkref #0: rel 1663/13018/88964795 blk 3139 >56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETEoff 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW So, this DELETE change triggers the issue. If I understand it correctly, that means the transaction locked the tuple and then deleted it. But I don't see any record about the lock. But this output seems a bit incomplete - there's an assignment record for subxact 1667601528, without any records. And there's no assignment for 1667601530. Can you simply post the whole WAL output for the transaction LSN range, i.e. something like pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 ... Any idea what this transaction actually does, i.e. what commands it executes? Presumably it does some plpgsql with subtransactions? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, (Alvaro, see multixact point below) Are there any partitions? On 2019-11-19 23:17:32 +0100, Tomas Vondra wrote: > Yep. thanks for this data. This seems to confirm Andres' suspicion that > the change is actually a delete. It's not clear to me why we're doing > this toast thing for DELETEs unconditionally. Well, there never should be any toast activity in case of deletions. So there's already is something wrong in that case. I don't think just ignoring that something is wrong would be good - but raising an error would make sense. > For the record, the change LSN is 25EE/D6DE33B0 and the transactions is > between 25EE/D66F0438 and 25EE/D6DE6F00. > > 2514:rmgr: Transaction len (rec/tot): 38/ 38, tx: 0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENTxtop 1667601527: subxacts: 1667601528 So there's subtransactions in this workload... > > 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERToff 4, blkref #0: rel 1663/13018/88964795 blk 3125 > > 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAFoff 256, blkref #0: rel 1663/13018/88964797 blk 15840 > > 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERToff 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW > > 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAFoff 257, blkref #0: rel 1663/13018/88964797 blk 15840 > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERToff 2, blkref #0: rel 1663/13018/88964792 blk 18475 > > 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 > > 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 > > 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETEoff 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > > 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETEoff 4 , blkref #0: rel 1663/13018/88964795 blk 3125 > > 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETEoff 7 , blkref #0: rel 1663/13018/88964795 blk 3139 > > 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETEoff 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW > > 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCKoff 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 > > 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATEoff 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792blk 18460 > > 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAFoff 140, blkref #0: rel 1663/13018/88964798 blk 2058 > > 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAFoff 28, blkref #0: rel 1663/13018/88964799 blk 5076 > > 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAFoff 88, blkref #0: rel 1663/13018/88964800 blk 3412 > > 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 > So, this DELETE change triggers the issue. If I understand it correctly, > that means the transaction locked the tuple and then deleted it. But I > don't see any record about the lock. Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when if (old_infomask & HEAP_XMAX_INVALID) is true. Which is the most common case. I actually find it *more* curious to *not* see that for a delete. I've not traced fully through it, yet. Alvaro, it's not clear to me whether the the multixact codepaths in compute_new_xmax_infomask() actually work as expected for deletes. > But this output seems a bit incomplete - there's an assignment record > for subxact 1667601528, without any records. And there's no assignment > for 1667601530. Can you simply post the whole WAL output for the > transaction LSN range, i.e. something like > > pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 ... Yea, that'd be quite helpful. Which relations do the relfilenodes correspond to? Most importantly 1663/13018/88964815, 1663/13018/88964795, 1663/13018/88964792 You can look up the mapping, if there's not been any rewriting DDL since, with e.g. SELECT r, r::regclass FROM pg_filenode_relation(13018, 88964815) r; Greetings, Andres Freund
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Alvaro Herrera
Date:
On 2019-Nov-19, Andres Freund wrote: > > > 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc:INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125 > > > 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc:INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840 > > > 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc:INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW > > > 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc:INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840 > > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 > > > 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 > > > 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 > > > 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 > > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > > > 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc:DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125 > > > 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc:DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139 > > > 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc:DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW > > > > 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc:LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 > > > 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc:UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792blk 18460 > > > 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 > > > 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 > > > 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 > > > 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc:COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 > > > So, this DELETE change triggers the issue. If I understand it correctly, > > that means the transaction locked the tuple and then deleted it. But I > > don't see any record about the lock. > > Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when > called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when > > if (old_infomask & HEAP_XMAX_INVALID) > > is true. Which is the most common case. I don't understand what you guys are saying. There's a DELETE at 25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble. Right? There's also a LOCK+UPDATE sequence just below that, which is unrelated (different rel). Are you saying that one of them is toast and the other is its main heap? It would be useful to know more about the rels. > I actually find it *more* > curious to *not* see that for a delete. I've not traced fully through > it, yet. > Alvaro, it's not clear to me whether the the multixact codepaths in > compute_new_xmax_infomask() actually work as expected for deletes. Are you saying that the three *prior* deletes (lines 2532-2534) are strange because they lack the KEYS_UPDATED bit? (Note these are all for different rels). I'm not sure what to do about this report at this point. Notwithstanding my general failure to focus on anything too complex given my surrounding social context. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 26, 2019 at 02:10:54PM -0300, Alvaro Herrera wrote: >On 2019-Nov-19, Andres Freund wrote: > >> > > 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc:INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125 >> > > 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc:INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840 >> > > 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc:INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW >> > > 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc:INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840 >> > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 >> > > 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 >> > > 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 >> > > 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 >> > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 >> > > 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc:DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125 >> > > 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc:DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139 >> > > 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc:DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW >> >> > > 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc:LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 >> > > 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc:UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792blk 18460 >> > > 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 >> > > 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 >> > > 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 >> > > 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc:COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 >> >> > So, this DELETE change triggers the issue. If I understand it correctly, >> > that means the transaction locked the tuple and then deleted it. But I >> > don't see any record about the lock. >> >> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when >> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when >> >> if (old_infomask & HEAP_XMAX_INVALID) >> >> is true. Which is the most common case. > >I don't understand what you guys are saying. There's a DELETE at >25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble. >Right? There's also a LOCK+UPDATE sequence just below that, which is >unrelated (different rel). Are you saying that one of them is toast and >the other is its main heap? It would be useful to know more about the >rels. > I was wondering why there's no LOCK before the DELETE, when the waldump output has KEYS_UPDATED for that change. But as Andres says, it's likely normal because e.g. heap_delete sets flag in certain cases. I agree it would be useful to know more about the rels. AFAIK 88964815 is the heap, and my guess is the other rels are various indexes on it. I've already said so in my message from last week. >> I actually find it *more* >> curious to *not* see that for a delete. I've not traced fully through >> it, yet. >> Alvaro, it's not clear to me whether the the multixact codepaths in >> compute_new_xmax_infomask() actually work as expected for deletes. > >Are you saying that the three *prior* deletes (lines 2532-2534) are >strange because they lack the KEYS_UPDATED bit? (Note these are all for >different rels). > >I'm not sure what to do about this report at this point. >Notwithstanding my general failure to focus on anything too complex >given my surrounding social context. > I think having more information about the structure (tables, indexes, mapping for relfilenodes) and a better idea what the transaction is doing, would be helpful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Andrey Salnikov
Date:
Hi, I’m sorry for late response.
26 нояб. 2019 г., в 21:27, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
I think having more information about the structure (tables, indexes,
mapping for relfilenodes) and a better idea what the transaction is
doing, would be helpful.
Here is information about tables
filenode | relation
----------+---------------------------
88964815 | table1
88964795 | pg_toast.pg_toast_4029268 (toast table2)
88964792 | table2
(1 row)
Table "public.table1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+----------+-----------+----------+---------+----------+--------------+-------------
column1 | jsonb | | | | extended | |
column3 | smallint | | not null | | plain | |
column4 | integer | | not null | | plain | |
column5 | integer | | not null | | plain | |
column6 | integer | | not null | | plain | |
column7 | smallint | | not null | | plain | |
Indexes:
"table1_pkey" PRIMARY KEY, btree (column6, column7, column5, column4, column3) WITH (fillfactor='70')
"table1_index1" btree (column7, column5, column4) WHERE column7 = ? OR column7 = ? OR column7 = ?
Publications:
"pub1"
Replica Identity: FULL
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100
Table "public.table2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
column1 | integer | | not null | | plain | |
column2 | jsonb | | | | extended | |
column3 | jsonb | | | | extended | |
column5 | bigint | | | | plain | |
column6 | double precision | | | | plain | |
column7 | character varying(32) | | | | extended | |
column8 | bigint | | | | plain | |
column10 | smallint | | | | plain | |
column11 | bigint | | | | plain | |
column12 | bigint | | | | plain | |
column13 | integer | | | | plain | |
column14 | bigint | | | | plain | |
Indexes:
"table2_pkey" PRIMARY KEY, btree (column1)
"table2_index1" btree (column1, column14, column12) WITH (fillfactor='90')
"table2_index2" btree (column11, column14, column12, column8, column1) WITH (fillfactor='50')
Publications:
"pub1"
Replica Identity: FULL
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100
attrelid | attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions
--------------+----------+------------------------------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
table1 | 4029244 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | column1 | 3802 | -1 | -1 | 1 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | |
table1 | 4029244 | ........pg.dropped.2........ | 0 | 0 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f | | t | t | 0 | 0 | | |
table1 | 4029244 | column3 | 21 | -1 | 2 | 3 | 0 | -1 | -1 | t | p | s | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | column4 | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | column5 | 23 | -1 | 4 | 5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | column6 | 23 | -1 | 4 | 6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table1 | 4029244 | column7 | 21 | -1 | 2 | 7 | 0 | -1 | -1 | t | p | s | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column1 | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column2 | 3802 | -1 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column3 | 3802 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | ........pg.dropped.4........ | 0 | 0 | -1 | 4 | 0 | -1 | -1 | f | x | i | f | f | | t | t | 0 | 0 | | |
table2 | 4029268 | column5 | 20 | -1 | 8 | 5 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column6 | 701 | -1 | 8 | 6 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column7 | 1043 | -1 | -1 | 7 | 0 | -1 | 36 | f | x | i | f | f | | f | t | 0 | 100 | | |
table2 | 4029268 | column8 | 20 | -1 | 8 | 8 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | ........pg.dropped.9........ | 0 | 0 | 4 | 9 | 0 | -1 | -1 | t | p | i | f | f | | t | t | 0 | 0 | | |
table2 | 4029268 | column10 | 21 | -1 | 2 | 10 | 0 | -1 | -1 | t | p | s | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column11 | 20 | -1 | 8 | 11 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column12 | 20 | -1 | 8 | 12 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column13 | 23 | -1 | 4 | 13 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | |
table2 | 4029268 | column14 | 20 | -1 | 8 | 14 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | |
And extracted information from wal file by pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 in attached file.
Attachment
On Tue, Nov 26, 2019 at 11:35:51PM +0300, Andrey Salnikov wrote: >Hi, I’m sorry for late response. > >> 26 нояб. 2019 г., в 21:27, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> >> I think having more information about the structure (tables, indexes, >> mapping for relfilenodes) and a better idea what the transaction is >> doing, would be helpful. > >Here is information about tables > > filenode | relation >----------+--------------------------- > 88964815 | table1 > 88964795 | pg_toast.pg_toast_4029268 (toast table2) > 88964792 | table2 >(1 row) > > Table "public.table1" > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description >------------+----------+-----------+----------+---------+----------+--------------+------------- > column1 | jsonb | | | | extended | | > column3 | smallint | | not null | | plain | | > column4 | integer | | not null | | plain | | > column5 | integer | | not null | | plain | | > column6 | integer | | not null | | plain | | > column7 | smallint | | not null | | plain | | >Indexes: > "table1_pkey" PRIMARY KEY, btree (column6, column7, column5, column4, column3) WITH (fillfactor='70') > "table1_index1" btree (column7, column5, column4) WHERE column7 = ? OR column7 = ? OR column7 = ? >Publications: > "pub1" >Replica Identity: FULL >Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000,autovacuum_enabled=true, fillfactor=100 > > Table "public.table2" > Column | Type | Collation | Nullable | Default | Storage | Stats target | Description >--------------+-----------------------+-----------+----------+---------+----------+--------------+------------- > column1 | integer | | not null | | plain | | > column2 | jsonb | | | | extended | | > column3 | jsonb | | | | extended | | > column5 | bigint | | | | plain | | > column6 | double precision | | | | plain | | > column7 | character varying(32) | | | | extended | | > column8 | bigint | | | | plain | | > column10 | smallint | | | | plain | | > column11 | bigint | | | | plain | | > column12 | bigint | | | | plain | | > column13 | integer | | | | plain | | > column14 | bigint | | | | plain | | >Indexes: > "table2_pkey" PRIMARY KEY, btree (column1) > "table2_index1" btree (column1, column14, column12) WITH (fillfactor='90') > "table2_index2" btree (column11, column14, column12, column8, column1) WITH (fillfactor='50') >Publications: > "pub1" >Replica Identity: FULL >Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000,autovacuum_enabled=true, fillfactor=100 > > attrelid | attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff| atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal| attinhcount | attcollation | attacl | attoptions | attfdwoptions >--------------+----------+------------------------------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+--------------- > table1 | 4029244 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | column1 | 3802 | -1 | -1 | 1 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | | > table1 | 4029244 | ........pg.dropped.2........ | 0 | 0 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f | | t | t | 0 | 0 | | | > table1 | 4029244 | column3 | 21 | -1 | 2 | 3 | 0 | -1 | -1 | t | p | s | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | column4 | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | column5 | 23 | -1 | 4 | 5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | column6 | 23 | -1 | 4 | 6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table1 | 4029244 | column7 | 21 | -1 | 2 | 7 | 0 | -1 | -1 | t | p | s | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column1 | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column2 | 3802 | -1 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column3 | 3802 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | ........pg.dropped.4........ | 0 | 0 | -1 | 4 | 0 | -1 | -1 | f | x | i | f | f | | t | t | 0 | 0 | | | > table2 | 4029268 | column5 | 20 | -1 | 8 | 5 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column6 | 701 | -1 | 8 | 6 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column7 | 1043 | -1 | -1 | 7 | 0 | -1 | 36 | f | x | i | f | f | | f | t | 0 | 100 | | | > table2 | 4029268 | column8 | 20 | -1 | 8 | 8 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | ........pg.dropped.9........ | 0 | 0 | 4 | 9 | 0 | -1 | -1 | t | p | i | f | f | | t | t | 0 | 0 | | | > table2 | 4029268 | column10 | 21 | -1 | 2 | 10 | 0 | -1 | -1 | t | p | s | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column11 | 20 | -1 | 8 | 11 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column12 | 20 | -1 | 8 | 12 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column13 | 23 | -1 | 4 | 13 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | > table2 | 4029268 | column14 | 20 | -1 | 8 | 14 | 0 | -1 | -1 | t | p | d | f | f | | f | t | 0 | 0 | | | > >And extracted information from wal file by pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 in attached file. Can you also show how those relations map to the relfilenodes referenced by the WAL? This should do the trick, I think: SELECT relname FORM pg_class WHERE relfilenode IN (88964795, 88964797, 88964795, 88964792, 88964798, 88964799, 88964800, 88964815); Also, any idea what the transactions does? It seems it inserts 2 rows into 88964795, then one row into 88964792, and then it deletes those new records in the same subxact. And then it does a delete on 88964815 which triggers the segfault. How do you create the subtransactions? plpgsql procedure with exception blocks, savepoints? I'm trying to reproduce it and I'm not sure if those details matter. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Alvaro Herrera
Date:
On 2019-Nov-19, Andres Freund wrote: > > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 > > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when > called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when > > if (old_infomask & HEAP_XMAX_INVALID) > > is true. Which is the most common case. I actually find it *more* > curious to *not* see that for a delete. I've not traced fully through > it, yet. > > > Alvaro, it's not clear to me whether the the multixact codepaths in > compute_new_xmax_infomask() actually work as expected for deletes. Hmm ... I think you're complaining about failing to set the HEAP_KEYS_UPDATED bit when the delete transaction is the same as the insert transaction. Grepping the new waldump file the OP sent for relfilenode 88964792 I saw this: rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601529, lsn: 25EE/D66F1A78, prev 25EE/D66F1A38, desc: INSERT off5, blkref #0: rel 1663/13018/88964792 blk 18460 rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off2, blkref #0: rel 1663/13018/88964792 blk 18475 rmgr: Heap len (rec/tot): 48/ 48, tx: 1667601529, lsn: 25EE/D66F7E10, prev 25EE/D66F6778, desc: HEAP_CONFIRMoff 5, blkref #0: rel 1663/13018/88964792 blk 18460 rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off2 , blkref #0: rel 1663/13018/88964792 blk 18475 rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601536, lsn: 25EE/D6BA3BC0, prev 25EE/D6BA3B70, desc: LOCK off 5:xid 1667601536: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 20806 rmgr: Heap len (rec/tot): 3955/ 3955, tx: 1667601536, lsn: 25EE/D6BA3D00, prev 25EE/D6BA3CB0, desc: UPDATE off5 xmax 1667601536 ; new off 2 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18491, blkref #1: rel 1663/13018/88964792blk 20806 rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5:xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460 rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792blk 18460 I wonder about the HEAP_CONFIRM there ... it's in a completely different transaction of course, so it's not related, but note that the corresponding insert (25EE/D66F1A78) looks like a regular INSERT, it doesn't indicate that it's a speculative insert, which it must have been or there wouldn't be a confirm. So I wonder if our insert just below is *also* a speculative insert ... which perhaps could cause the compute_new_xmax_infomask code to misbehave. I think pg_waldump should print out the speculativeness of an insertion. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Alvaro Herrera
Date:
On 2019-Nov-27, Alvaro Herrera wrote: > I think pg_waldump should print out the speculativeness of an insertion. Ah, in pg12 we do print the "flags" for the insertion. But earlier releases don't. IMO we should have backpatched that change, as I said in https://postgr.es/m/20180423160216.m2yjhx67grofg7h5@alvherre.pgsql but I was outvoted. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 27, 2019 at 11:57:00AM -0300, Alvaro Herrera wrote: >On 2019-Nov-27, Alvaro Herrera wrote: > >> I think pg_waldump should print out the speculativeness of an insertion. > >Ah, in pg12 we do print the "flags" for the insertion. But earlier >releases don't. IMO we should have backpatched that change, as I said >in https://postgr.es/m/20180423160216.m2yjhx67grofg7h5@alvherre.pgsql >but I was outvoted. > Hmmmm, yeah. I still think the decision not to backpatch was reasonable, although it seems it might have been useful in this particular case. But hey, it should not be that difficult to build a custom pg_waldump version with that patch applied, and decode the WAL segment. It can even be done on a completely separate system, it's just a single binary. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Alvaro Herrera
Date:
On 2019-Nov-19, Andres Freund wrote: > > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 > > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > > So, this DELETE change triggers the issue. If I understand it correctly, > > that means the transaction locked the tuple and then deleted it. But I > > don't see any record about the lock. > > Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when > called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when > > if (old_infomask & HEAP_XMAX_INVALID) > > is true. Which is the most common case. I actually find it *more* > curious to *not* see that for a delete. I've not traced fully through > it, yet. > > > Alvaro, it's not clear to me whether the the multixact codepaths in > compute_new_xmax_infomask() actually work as expected for deletes. So, what's really happening here is heap_insert(speculative) followed by heap_abort_speculative(). That seems to explain the above WAL sequence. I *think* the speculative insertion code in reorderbuffer simply delays the tuple insertion until it is confirmed; so this sequence would result in nothing at all happening at ReorderBufferCommit. Unless something has messed up the state somehow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 27, 2019 at 02:59:35PM -0300, Alvaro Herrera wrote: >On 2019-Nov-19, Andres Freund wrote: > >> > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 >> > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 > >> > So, this DELETE change triggers the issue. If I understand it correctly, >> > that means the transaction locked the tuple and then deleted it. But I >> > don't see any record about the lock. >> >> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when >> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when >> >> if (old_infomask & HEAP_XMAX_INVALID) >> >> is true. Which is the most common case. I actually find it *more* >> curious to *not* see that for a delete. I've not traced fully through >> it, yet. >> >> >> Alvaro, it's not clear to me whether the the multixact codepaths in >> compute_new_xmax_infomask() actually work as expected for deletes. > >So, what's really happening here is heap_insert(speculative) followed by >heap_abort_speculative(). That seems to explain the above WAL sequence. >I *think* the speculative insertion code in reorderbuffer simply delays >the tuple insertion until it is confirmed; so this sequence would result >in nothing at all happening at ReorderBufferCommit. Unless something >has messed up the state somehow. > Hmm, so how would that transaction look like? My reading of the WAL (based on the available relfilenode info) is this: Transaction 25EE/D66F0438 ASSIGNMENT xtop 1667601527: subxacts: 1667601528 -> seems like a savepoint, probably Heap 25EE/D66F11E8 INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125 Heap 25EE/D66F2118 INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW Btree 25EE/D66F54C8 INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840 Heap 25EE/D66F5508 INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 Btree 25EE/D66F6738 INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 Btree 25EE/D66F7E40 INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 Btree 25EE/D66F7EA0 INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 -> insert into TOAST table (88964795) + TOAST idx (probably 88964797) -> then an insert into table2 (88964792), and it's indexes Heap 25EE/D66F7EF0 DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 Heap 25EE/D66F7F28 DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125 Heap 25EE/D66F7F60 DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139 -> now we delete the record from table2 and also it's TOAST table (how do we trigger that?) Heap 25EE/D6DE33B0 DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW -> delete from table1 (88964815), which we haven't touched in this xact until now (this is the crash LSN) Heap 25EE/D6DE53D0 LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792blk 18460 Heap 25EE/D6DE5408 UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499,blkref #1: rel 1663/13018/88964792 blk 18460 Btree 25EE/D6DE6DC8 INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 Btree 25EE/D6DE6E08 INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 Btree 25EE/D6DE6E68 INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 Transaction 25EE/D6DE6EB8 COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538 -> seems like a lock of a record in table2, along with update of indexes I can more or less simulate this by begin; savepoint s; insert into table2 values (1, ... jsonb ..., NULL, 1, 1, 'x', 1, 1, 1, 1, 1, 1) on conflict (column1) do update set column14 = 1; delete from table2 where column1 = 1; update table2 set column14 = 2 where column1 = 1; commit; Which however does not explain the three deletes from table2 + TOAST. But, maybe that's not the issue, because the decoding fails on the next record, which is for table1. FWIW attached is a SQL script creating tables with the same structure (without the dropped columns, but I think those are irrelevant). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 27, 2019 at 11:39:33PM +0100, Tomas Vondra wrote: >On Wed, Nov 27, 2019 at 02:59:35PM -0300, Alvaro Herrera wrote: >>On 2019-Nov-19, Andres Freund wrote: >> > > ... > >FWIW attached is a SQL script creating tables with the same structure >(without the dropped columns, but I think those are irrelevant). > Meh, this time with the file actually attached ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
From
Alvaro Herrera
Date:
> Hmm, so how would that transaction look like? My reading of the WAL > (based on the available relfilenode info) is this: > > Transaction 25EE/D66F0438 ASSIGNMENT xtop 1667601527: subxacts: 1667601528 > > -> seems like a savepoint, probably > > Heap 25EE/D66F11E8 INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125 > Heap 25EE/D66F2118 INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW > Btree 25EE/D66F54C8 INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840 > Heap 25EE/D66F5508 INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475 > Btree 25EE/D66F6738 INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058 > Btree 25EE/D66F7E40 INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076 > Btree 25EE/D66F7EA0 INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412 > > -> insert into TOAST table (88964795) + TOAST idx (probably 88964797) > -> then an insert into table2 (88964792), and it's indexes I think it's an insert into table2 (which has at least four indexes), followed by insertion in 3 indexes, followed by failure to insert into the fourth index (which naturally does not appear in WAL). > Heap 25EE/D66F7EF0 DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475 Then the heap insertion is speculative-aborted because of failure to insert into the fourth index ... > Heap 25EE/D66F7F28 DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125 > Heap 25EE/D66F7F60 DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139 ... and this propagates to the toast table (toast_delete in heap_abort_speculative) I didn't look from this point down. I'll look at your reproducer script now; I think you need to make it so that the index specified in the ON CONFLICT clause is the fourth one in OID order. (Not really sure how relevant the other index insertions are, really.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services