Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date
Msg-id CAD21AoAE3j=4SgcY3gLZWwpW+OYV=fumhjeEQmb66AvV27-rrQ@mail.gmail.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Oct 17, 2022 at 4:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 12, 2022 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Please note that to pass the new regression tests, the fix proposed in
> > a related thread[1] is required. Particularly, we need:
> >
> > @@ -1099,6 +1099,9 @@ SnapBuildCommitTxn(SnapBuild *builder,
> > XLogRecPtr lsn, TransactionId xid,
> >         else if (sub_needs_timetravel)
> >         {
> >                 /* track toplevel txn as well, subxact alone isn't meaningful */
> > +               elog(DEBUG2, "forced transaction %u to do timetravel
> > due to one of its subtransaction",
> > +                        xid);
> > +               needs_timetravel = true;
> >                 SnapBuildAddCommittedTxn(builder, xid);
> >         }
> >         else if (needs_timetravel)
> >
> > A side benefit of this approach is that we can fix another assertion
> > failure too that happens on REL14 and REL15 and reported here[2]. In
> > the commits 68dcce247f1a(REL14) and 272248a0c1(REL15), the reason why
> > we make the association between sub-txns to top-txn in
> > SnapBuildXidSetCatalogChanges() is just to avoid the assertion failure
> > in AssertTXNLsnOrder(). However, since the invalidation messages are
> > not transported from sub-txn to top-txn during the assignment, another
> > assertion check in ReorderBufferForget() fails when forgetting the
> > subtransaction. If we apply this idea of skipping the assertion
> > checks, we no longer need to make the such association in
> > SnapBuildXidSetCatalogChanges() and resolve this issue as well.
> >
>
> IIUC, here you are speaking of three different changes. Change-1: Add
> a check in AssertTXNLsnOrder() to skip assert checking till we reach
> start_decoding_at. Change-2: Set needs_timetravel to true in one of
> the else if branches in SnapBuildCommitTxn(). Change-3: Remove the
> call to ReorderBufferAssignChild() from SnapBuildXidSetCatalogChanges
> in PG-14/15 as that won't be required after Change-1.

Yes.

>
> AFAIU, Change-1 is required till v10; Change-2 and Change-3 are
> required in HEAD/v15/v14 to fix the problem.

IIUC Change-2 is required in v16 and HEAD but not mandatory in v15 and
v14. The reason why we need Change-2 is that there is a case where we
mark only subtransactions as containing catalog change while not doing
that for its top-level transaction. In v15 and v14, since we mark both
subtransactions and top-level transaction in
SnapBuildXidSetCatalogChanges() as containing catalog changes, we
don't get the assertion failure at "Assert(!needs_snapshot ||
needs_timetravel)".

Regarding Change-3, it's required in v15 and v14 but not in HEAD and
v16. Since we didn't add SnapBuildXidSetCatalogChanges() to v16 and
HEAD, Change-3 cannot be applied to the two branches.

> Now, the second and third
> changes are not required in branches prior to v14 because we don't
> record invalidations via XLOG_XACT_INVALIDATIONS record. However, if
> we want, we can even back-patch Change-2 and Change-3 to keep the code
> consistent or maybe just Change-3.

Right. I don't think it's a good idea to back-patch Change-2 in
branches prior to v14 as it's not a relevant issue. Regarding
back-patching Change-3 to branches prior 14, I think it may be okay
til v11, but I'd be hesitant for v10 as the final release comes in a
month.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: havingQual vs hasHavingQual buglets
Next
From: Craig Ringer
Date:
Subject: Re: POC: Better infrastructure for automated testing of concurrency issues