Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAA4eK1KW8KzhnNiLk3ayKUA4CkVNb_fm8USqXDw0nUK_0togJg@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (vignesh C <vignesh21@gmail.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Wed, Oct 30, 2019 at 9:38 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > I think the patch should do the simplest thing possible, i.e. what it
> > does today. Otherwise we'll never get it committed.
> >
> I found a couple of crashes while reviewing and testing flushing of
> open transaction data:
>

Thanks for doing these tests.  However, I don't think these issues are
anyway related to this patch.  It seems to be base code issues
manifested by this patch.  See my analysis below.

> Issue 1:
> #0  0x00007f22c5722337 in raise () from /lib64/libc.so.6
> #1  0x00007f22c5723a28 in abort () from /lib64/libc.so.6
> #2  0x0000000000ec5390 in ExceptionalCondition
> (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> "FailedAssertion",
>     fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> lineNumber=458) at assert.c:54
> #3  0x0000000000b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> off=64) at ../../../../src/include/lib/ilist.h:458
> #4  0x0000000000b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> oldestRunningXid=3834) at reorderbuffer.c:1966
> #5  0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> buf=0x7ffcbc26dc50) at decode.c:332
>

This seems to be the problem of base code where we abort immediately
after serializing the changes because in that case, the changes list
will be empty.  I think you can try to reproduce it via the debugger
or by hacking the code such that it serializes after every change and
then if you abort after one change, it should hit this problem.

>
> Issue 2:
> #0  0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6
> #1  0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6
> #2  0x0000000000ec4e1d in ExceptionalCondition
> (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> errorType=0x10ea284 "FailedAssertion",
>     fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> #3  0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:3052
> #4  0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0y x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:1318
> #5  0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> txn=0x2b9d778) at reorderbuffer.c:1257
> #6  0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> oldestRunningXid=3835) at reorderbuffer.c:1973
>

This seems to be again the problem with base code as we don't update
the final_lsn for subtransactions during ReorderBufferAbortOld.  This
can also be reproduced with some hacking in code or via debugger in a
similar way as explained for the previous problem but with a
difference that there must be subtransaction involved in this case.

> #7  0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> buf=0x7ffcbc74cc00) at decode.c:332
> #8  0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> record=0x2b67990) at decode.c:121
> #9  0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845
>
> These failures come randomly.
> I'm not able to reproduce this issue with simple test case.

Yeah, it appears to be difficult to reproduce unless you hack the code
to serialize every change or use debugger to forcefully flush the
changes every time.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: pg_waldump erroneously outputs newline for FPWs, and anotherminor bug
Next
From: Grigory Smolkin
Date:
Subject: [proposal] recovery_target "latest"