Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date
Msg-id CAD21AoBmrum63r5anV+Rxo2MyvxehYioRvkKqvENdYzMvE8_7w@mail.gmail.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
List pgsql-hackers
On Tue, Oct 18, 2022 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 18, 2022 at 6:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Oct 17, 2022 at 4:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > IIUC, here you are speaking of three different changes. Change-1: Add
> > > a check in AssertTXNLsnOrder() to skip assert checking till we reach
> > > start_decoding_at. Change-2: Set needs_timetravel to true in one of
> > > the else if branches in SnapBuildCommitTxn(). Change-3: Remove the
> > > call to ReorderBufferAssignChild() from SnapBuildXidSetCatalogChanges
> > > in PG-14/15 as that won't be required after Change-1.
> >
> > Yes.
> >
> > >
> > > AFAIU, Change-1 is required till v10; Change-2 and Change-3 are
> > > required in HEAD/v15/v14 to fix the problem.
> >
> > IIUC Change-2 is required in v16 and HEAD
> >
>
> Why are you referring v16 and HEAD separately?

Sorry, my wrong, I was confused.

>
> > but not mandatory in v15 and
> > v14. The reason why we need Change-2 is that there is a case where we
> > mark only subtransactions as containing catalog change while not doing
> > that for its top-level transaction. In v15 and v14, since we mark both
> > subtransactions and top-level transaction in
> > SnapBuildXidSetCatalogChanges() as containing catalog changes, we
> > don't get the assertion failure at "Assert(!needs_snapshot ||
> > needs_timetravel)".
> >
> > Regarding Change-3, it's required in v15 and v14 but not in HEAD and
> > v16. Since we didn't add SnapBuildXidSetCatalogChanges() to v16 and
> > HEAD, Change-3 cannot be applied to the two branches.
> >
> > > Now, the second and third
> > > changes are not required in branches prior to v14 because we don't
> > > record invalidations via XLOG_XACT_INVALIDATIONS record. However, if
> > > we want, we can even back-patch Change-2 and Change-3 to keep the code
> > > consistent or maybe just Change-3.
> >
> > Right. I don't think it's a good idea to back-patch Change-2 in
> > branches prior to v14 as it's not a relevant issue.
> >
>
> Fair enough but then why to even backpatch it to v15 and v14?

Oops, it's a typo. I wanted to say Change-2 should be back-patched only to HEAD.

>
> > Regarding
> > back-patching Change-3 to branches prior 14, I think it may be okay
> > til v11, but I'd be hesitant for v10 as the final release comes in a
> > month.
> >
>
> So to fix the issue in all branches, what we need to do is to
> backpatch change-1: in all branches till v10, change-2: in HEAD, and
> change-3: in V15 and V14. Additionally, we think, it is okay to
> backpatch change-3 till v11 as it is mainly done to avoid the problem
> fixed by change-1 and it makes code consistent in back branches.

Right.

>
> I think because the test case proposed needs all three changes, we can
> push the change-1 without a test case and then as a second patch have
> change-2 for HEAD and change-3 for back branches with the test case.
> Do you have any other ideas to proceed here?

I found another test case that causes the assertion failure at
"Assert(!needs_snapshot || needs_timetravel);" on all branches. I've
attached the patch for the test case. In this test case, I modified a
user-catalog table instead of system-catalog table. That way, we don't
generate invalidation messages while generating NEW_CID records. As a
result, we mark only the subtransactions as containing catalog change
and don't make association between top-level and sub transactions. The
assertion failure happens on all supported branches. If we need to fix
this (I believe so), Change-2 needs to be backpatched to all supported
branches.

There are three changes as Amit mentioned, and regarding the test
case, we have three test cases I've attached: truncate_testcase.patch,
analyze_testcase.patch, uesr_catalog_testcase.patch. The relationship
between assertion failures and test cases are very complex. I could
not find any test case to cause only one assertion failure on all
branches. One idea to proceed is:

Patch-1 includes Change-1 and is applied to all branches.

Patch-2 includes Change-2 and the user_catalog test case, and is
applied to all branches.

Patch-3 includes Change-3 and the truncate test case (or the analyze
test case), and is applied to v14 and v15 (also till v11 if we
prefer).

The patch-1 doesn't include any test case but the user_catalog test
case can test both Change-1 and Change-2 on all branches. In v15 and
v14, the analyze test case causes both the assertions at
"Assert(txn->ninvalidations == 0);" and "Assert(prev_first_lsn <
cur_txn->first_lsn);" whereas the truncate test case causes the
assertion only at "Assert(txn->ninvalidations == 0);". Since the
patch-2 is applied on top of the patch-1, there is no difference in
terms of testing Change-2.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()
Next
From: Yugo NAGATA
Date:
Subject: Re: SI-read predicate locks on materialized views