Re: Reorderbuffer crash during recovery - Mailing list pgsql-bugs
From | vignesh C |
---|---|
Subject | Re: Reorderbuffer crash during recovery |
Date | |
Msg-id | CALDaNm0HLJe6cE4+GA-vKiF3CbMKjRzH9S1-RtFdYfPqR0opgQ@mail.gmail.com Whole thread Raw |
In response to | Re: Reorderbuffer crash during recovery (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Reorderbuffer crash during recovery
Re: Reorderbuffer crash during recovery |
List | pgsql-bugs |
On Wed, Nov 6, 2019 at 5:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 6, 2019 at 5:20 PM vignesh C <vignesh21@gmail.com> wrote: > > > > Hi, > > > > I found couple of crashes in reorderbuffer while review/testing of > > logical_work_mem and logical streaming of large in-progress > > transactions. Stack trace of the same are given below: > > Issue 1: > > #0 0x00007f985c7d8337 in raise () from /lib64/libc.so.6 > > #1 0x00007f985c7d9a28 in abort () from /lib64/libc.so.6 > > #2 0x0000000000ec514d in ExceptionalCondition > > (conditionName=0x10eab34 "!dlist_is_empty(head)", errorType=0x10eab24 > > "FailedAssertion", > > fileName=0x10eab00 "../../../../src/include/lib/ilist.h", > > lineNumber=458) at assert.c:54 > > #3 0x0000000000b4fd13 in dlist_tail_element_off (head=0x338fe60, > > off=48) at ../../../../src/include/lib/ilist.h:458 > > #4 0x0000000000b547b7 in ReorderBufferAbortOld (rb=0x32ae7a0, > > oldestRunningXid=895) at reorderbuffer.c:1910 > > #5 0x0000000000b3cb5e in DecodeStandbyOp (ctx=0x33424b0, > > buf=0x7fff7e7b1e40) at decode.c:332 > > #6 0x0000000000b3c363 in LogicalDecodingProcessRecord (ctx=0x33424b0, > > record=0x3342770) at decode.c:121 > > #7 0x0000000000b704b2 in XLogSendLogical () at walsender.c:2845 > > #8 0x0000000000b6e9f8 in WalSndLoop (send_data=0xb7038b > > <XLogSendLogical>) at walsender.c:2199 > > #9 0x0000000000b6bbf5 in StartLogicalReplication (cmd=0x33167a8) at > > walsender.c:1128 > > #10 0x0000000000b6ce83 in exec_replication_command > > (cmd_string=0x328a0a0 "START_REPLICATION SLOT \"sub1\" LOGICAL 0/0 > > (proto_version '1', publication_names '\"pub1\"')") > > at walsender.c:1545 > > #11 0x0000000000c39f85 in PostgresMain (argc=1, argv=0x32b51c0, > > dbname=0x32b50e0 "testdb", username=0x32b50c0 "user1") at > > postgres.c:4256 > > #12 0x0000000000b10dc7 in BackendRun (port=0x32ad890) at postmaster.c:4498 > > #13 0x0000000000b0ff3e in BackendStartup (port=0x32ad890) at postmaster.c:4189 > > #14 0x0000000000b08505 in ServerLoop () at postmaster.c:1727 > > #15 0x0000000000b0781a in PostmasterMain (argc=3, argv=0x3284cb0) at > > postmaster.c:1400 > > #16 0x000000000097492d in main (argc=3, argv=0x3284cb0) at main.c:210 > > > > Issue 2: > > #0 0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6 > > #1 0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6 > > #2 0x0000000000ec4e1d in ExceptionalCondition > > (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr", > > errorType=0x10ea284 "FailedAssertion", > > fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54 > > #3 0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0, > > txn=0x2bafb08) at reorderbuffer.c:3052 > > #4 0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0x2ae36b0, > > txn=0x2bafb08) at reorderbuffer.c:1318 > > #5 0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0, > > txn=0x2b9d778) at reorderbuffer.c:1257 > > #6 0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0, > > oldestRunningXid=3835) at reorderbuffer.c:1973 > > #7 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0, > > buf=0x7ffcbc74cc00) at decode.c:332 > > #8 0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0, > > record=0x2b67990) at decode.c:121 > > #9 0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845 > > > > From initial analysis it looks like: > > Issue1 it seems like if all the reorderbuffer has been flushed and > > then the server restarts. This problem occurs. > > Issue 2 it seems like if there are many subtransactions present and > > then the server restarts. This problem occurs. The subtransaction's > > final_lsn is not being set and when ReorderBufferRestoreCleanup is > > called the assert fails. May be for this we might have to set the > > subtransaction's final_lsn before cleanup(not sure). > > > > I could not reproduce this issue consistently with a test case, But I > > felt this looks like a problem from review. > > > > For issue1, I could reproduce by the following steps: > > 1) Change ReorderBufferCheckSerializeTXN so that it gets flushed always. > > 2) Have many open transactions with subtransactions open. > > 3) Attach one of the transaction from gdb and call abort(). > > Do you need subtransactions for the issue1? It appears that after the > restart if the changes list is empty it will hit the assert. Am I > missing something? > When I had reported this issue I could reproduce this issue with sub-transactions. Now I have tried without using sub-transactions and could still reproduce this issue. You are right Issue 1 will appear in both the cases with and without subtransactions. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
pgsql-bugs by date: