Re: Reorderbuffer crash during recovery - Mailing list pgsql-bugs

From vignesh C
Subject Re: Reorderbuffer crash during recovery
Date
Msg-id CALDaNm0HLJe6cE4+GA-vKiF3CbMKjRzH9S1-RtFdYfPqR0opgQ@mail.gmail.com
Whole thread Raw
In response to Re: Reorderbuffer crash during recovery  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Reorderbuffer crash during recovery
Re: Reorderbuffer crash during recovery
List pgsql-bugs
On Wed, Nov 6, 2019 at 5:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 6, 2019 at 5:20 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > Hi,
> >
> > I found couple of crashes in reorderbuffer while review/testing of
> > logical_work_mem and logical streaming of large in-progress
> > transactions. Stack trace of the same are given below:
> > Issue 1:
> > #0  0x00007f985c7d8337 in raise () from /lib64/libc.so.6
> > #1  0x00007f985c7d9a28 in abort () from /lib64/libc.so.6
> > #2  0x0000000000ec514d in ExceptionalCondition
> > (conditionName=0x10eab34 "!dlist_is_empty(head)", errorType=0x10eab24
> > "FailedAssertion",
> >     fileName=0x10eab00 "../../../../src/include/lib/ilist.h",
> > lineNumber=458) at assert.c:54
> > #3  0x0000000000b4fd13 in dlist_tail_element_off (head=0x338fe60,
> > off=48) at ../../../../src/include/lib/ilist.h:458
> > #4  0x0000000000b547b7 in ReorderBufferAbortOld (rb=0x32ae7a0,
> > oldestRunningXid=895) at reorderbuffer.c:1910
> > #5  0x0000000000b3cb5e in DecodeStandbyOp (ctx=0x33424b0,
> > buf=0x7fff7e7b1e40) at decode.c:332
> > #6  0x0000000000b3c363 in LogicalDecodingProcessRecord (ctx=0x33424b0,
> > record=0x3342770) at decode.c:121
> > #7  0x0000000000b704b2 in XLogSendLogical () at walsender.c:2845
> > #8  0x0000000000b6e9f8 in WalSndLoop (send_data=0xb7038b
> > <XLogSendLogical>) at walsender.c:2199
> > #9  0x0000000000b6bbf5 in StartLogicalReplication (cmd=0x33167a8) at
> > walsender.c:1128
> > #10 0x0000000000b6ce83 in exec_replication_command
> > (cmd_string=0x328a0a0 "START_REPLICATION SLOT \"sub1\" LOGICAL 0/0
> > (proto_version '1', publication_names '\"pub1\"')")
> >     at walsender.c:1545
> > #11 0x0000000000c39f85 in PostgresMain (argc=1, argv=0x32b51c0,
> > dbname=0x32b50e0 "testdb", username=0x32b50c0 "user1") at
> > postgres.c:4256
> > #12 0x0000000000b10dc7 in BackendRun (port=0x32ad890) at postmaster.c:4498
> > #13 0x0000000000b0ff3e in BackendStartup (port=0x32ad890) at postmaster.c:4189
> > #14 0x0000000000b08505 in ServerLoop () at postmaster.c:1727
> > #15 0x0000000000b0781a in PostmasterMain (argc=3, argv=0x3284cb0) at
> > postmaster.c:1400
> > #16 0x000000000097492d in main (argc=3, argv=0x3284cb0) at main.c:210
> >
> > Issue 2:
> > #0  0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6
> > #1  0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6
> > #2  0x0000000000ec4e1d in ExceptionalCondition
> > (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> > errorType=0x10ea284 "FailedAssertion",
> >     fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> > #3  0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> > txn=0x2bafb08) at reorderbuffer.c:3052
> > #4  0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> > txn=0x2bafb08) at reorderbuffer.c:1318
> > #5  0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> > txn=0x2b9d778) at reorderbuffer.c:1257
> > #6  0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> > oldestRunningXid=3835) at reorderbuffer.c:1973
> > #7  0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> > buf=0x7ffcbc74cc00) at decode.c:332
> > #8  0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> > record=0x2b67990) at decode.c:121
> > #9  0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845
> >
> > From initial analysis it looks like:
> > Issue1 it seems like if all the reorderbuffer has been flushed and
> > then the server restarts. This problem occurs.
> > Issue 2 it seems like if there are many subtransactions present and
> > then the server restarts. This problem occurs. The subtransaction's
> > final_lsn is not being set and when ReorderBufferRestoreCleanup is
> > called the assert fails. May be for this we might have to set the
> > subtransaction's final_lsn before cleanup(not sure).
> >
> > I could not reproduce this issue consistently with a test case, But I
> > felt this looks like a problem from review.
> >
> > For issue1, I could reproduce by the following steps:
> > 1) Change ReorderBufferCheckSerializeTXN so that it gets flushed always.
> > 2) Have many open transactions with subtransactions open.
> > 3) Attach one of the transaction from gdb and call abort().
>
> Do you need subtransactions for the issue1? It appears that after the
> restart if the changes list is empty it will hit the assert.  Am I
> missing something?
>

When I had reported this issue I could reproduce this issue with
sub-transactions. Now I have tried without using sub-transactions and
could still reproduce this issue. You are right Issue 1 will appear in
both the cases with and without subtransactions.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: The XLogFindNextRecord() routine find incorrect record startpoint after a long continuation record
Next
From: Dilip Kumar
Date:
Subject: Re: Reorderbuffer crash during recovery