Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From vignesh C
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CALDaNm1pXV7qpFiX77Li-jyPP+LOKcYuu2EvGeGeNpxGeAZGPA@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Nov 4, 2019 at 5:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 30, 2019 at 9:38 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > > I think the patch should do the simplest thing possible, i.e. what it
> > > does today. Otherwise we'll never get it committed.
> > >
> > I found a couple of crashes while reviewing and testing flushing of
> > open transaction data:
> >
>
> Thanks for doing these tests.  However, I don't think these issues are
> anyway related to this patch.  It seems to be base code issues
> manifested by this patch.  See my analysis below.
>
> > Issue 1:
> > #0  0x00007f22c5722337 in raise () from /lib64/libc.so.6
> > #1  0x00007f22c5723a28 in abort () from /lib64/libc.so.6
> > #2  0x0000000000ec5390 in ExceptionalCondition
> > (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> > "FailedAssertion",
> >     fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> > lineNumber=458) at assert.c:54
> > #3  0x0000000000b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> > off=64) at ../../../../src/include/lib/ilist.h:458
> > #4  0x0000000000b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> > oldestRunningXid=3834) at reorderbuffer.c:1966
> > #5  0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> > buf=0x7ffcbc26dc50) at decode.c:332
> >
>
> This seems to be the problem of base code where we abort immediately
> after serializing the changes because in that case, the changes list
> will be empty.  I think you can try to reproduce it via the debugger
> or by hacking the code such that it serializes after every change and
> then if you abort after one change, it should hit this problem.
>
> >
> > Issue 2:
> > #0  0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6
> > #1  0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6
> > #2  0x0000000000ec4e1d in ExceptionalCondition
> > (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> > errorType=0x10ea284 "FailedAssertion",
> >     fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> > #3  0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> > txn=0x2bafb08) at reorderbuffer.c:3052
> > #4  0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0y x2ae36b0,
> > txn=0x2bafb08) at reorderbuffer.c:1318
> > #5  0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> > txn=0x2b9d778) at reorderbuffer.c:1257
> > #6  0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> > oldestRunningXid=3835) at reorderbuffer.c:1973
> >
>
> This seems to be again the problem with base code as we don't update
> the final_lsn for subtransactions during ReorderBufferAbortOld.  This
> can also be reproduced with some hacking in code or via debugger in a
> similar way as explained for the previous problem but with a
> difference that there must be subtransaction involved in this case.
>
> > #7  0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> > buf=0x7ffcbc74cc00) at decode.c:332
> > #8  0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> > record=0x2b67990) at decode.c:121
> > #9  0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845
> >
> > These failures come randomly.
> > I'm not able to reproduce this issue with simple test case.
>
> Yeah, it appears to be difficult to reproduce unless you hack the code
> to serialize every change or use debugger to forcefully flush the
> changes every time.
>

Thanks Amit for your analysis, I was able to reproduce the above issue
consistently by making some code changes and with help of debugger. I
did one change so that it flushes every time instead of flushing after
the buffer size exceeds the logical_decoding_work_mem, attached one of
the transactions and called abort. When the server restarts after
abort, this problem occurs consistently. I could reproduce the issue
with base code also. It seems like this issue is not an issue of
0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer patch and
exists from base code. I will post the issue in hackers with details.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Grigory Smolkin
Date:
Subject: Re: [proposal] recovery_target "latest"
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench - rework variable management