Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date
Msg-id CAA4eK1KNGHzLRcGp9mppwo2ezH_OfaS24Gez8ck3XavFC2DB1g@mail.gmail.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
List pgsql-hackers
On Sun, Sep 4, 2022 at 7:38 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 9/4/22 14:24, Tomas Vondra wrote:
> >
> >> As per
> >> my understanding, the problem I reported in the email [1] is the same
> >> and we have seen this in BF failures as well. I posted a way to
> >> reproduce it in that email. It seems this is possible if the decoding
> >> gets XLOG_HEAP2_NEW_CID as the first record (belonging to a
> >> subtransaction) after XLOG_RUNNING_XACTS.
> >>
> >
> > Interesting. That's certainly true for WAL in the crashing case:
> >
> > rmgr: Standby     len (rec/tot):     58/    58, tx:          0, lsn:
> > 0/01CDCD70, prev 0/01CDCD10, desc: RUNNING_XACTS nextXid 850
> > latestCompletedXid 847 oldestRunningXid 848; 1 xacts: 848
> > rmgr: Heap2       len (rec/tot):     60/    60, tx:        849, lsn:
> > 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel 1663/16384/1249; tid
> > 58/38; cmin: 1, cmax: 14, combo: 6
> >
>
> I investigated using the pgdata from the crashed run (can provide, if
> you have rpi4 or some other aarch64 machine), and the reason is pretty
> simple - the restart_lsn for the slot is 0/1CDCD70, which is looong
> after the subxact assignment, so we add both xids as toplevel.
>
> That seems broken - if we skip the assignment like this, doesn't that
> break spill-to-disk and/or streaming? IIRC that's exactly why we had to
> start logging assignments immediately with wal_level=logical.
>

We had started logging assignments immediately in commit 0bead9af48
for streaming transactions in PG14. This issue exists prior to that. I
have tried and reproduced it in PG13 but I think it will be there even
before that. So, I am not sure if the spilling behavior is broken due
to this. I think if we don't get assignment recording before
processing changes during decoding commit then we could miss sending
the changes which won't be the case here. Do you see any other
problem?

> Or maybe we're not dealing with the restart_lsn properly, and we should
> have ignored those records. Both xacts started long before the restart
> LSN, so we're not seeing the whole xact anyway.
>

Right, but is that problematic? The restart LSN will be used as a
point to start reading the WAL and that helps in building a consistent
snapshot. However, for decoding to send the commit, we use
start_decoding_at point which will ensure that we send complete
transactions.

> However, when processing the NEW_CID record:
>
> tx:        849, lsn: 0/01CDCDB0, prev 0/01CDCD70, desc: NEW_CID rel
> 1663/16384/1249; tid 58/38; cmin: 1, cmax: 14, combo: 6
>
> we ultimately do this in SnapBuildProcessNewCid:
>
> #1  0x0000005566cccdb4 in ReorderBufferAddNewTupleCids (rb=0x559dd64218,
> xid=848, lsn=30264752, locator=..., tid=..., cmin=1, cmax=14,
> combocid=6) at reorderbuffer.c:3218
> #2  0x0000005566cd1f7c in SnapBuildProcessNewCid (builder=0x559dd6a248,
> xid=849, lsn=30264752, xlrec=0x559dd6e1e0) at snapbuild.c:818
>
> so in fact we *know* 849 is a subxact of 848, but we don't call
> ReorderBufferAssignChild in this case. In fact we can't even do the
> assignment easily in this case, because we create the subxact first, so
> that the crash happens right when we attempt to create the toplevel one,
> and we never even get a chance to do the assignment:
>
> 1) process the NEW_CID record, logged for 849 (subxact)
> 2) process CIDs in the WAL record, which has topleve_xid 848
>
>
> So IMHO we need to figure out what to do for WAL records that create
> both the toplevel and subxact - either we need to skip them, or rethink
> how we create the ReorderBufferTXN structs.
>

As per my understanding, we can't skip them as they are used to build
the snapshot.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Handle infinite recursion in logical replication setup
Next
From: Thomas Munro
Date:
Subject: Re: pg15b3: recovery fails with wal prefetch enabled