Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date
Msg-id ff248f9a-2bf8-33af-c276-1796725ca8ba@enterprisedb.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 6/6/23 17:42, Tomas Vondra wrote:
> 
> 
> On 6/6/23 14:00, Alexander Lakhin wrote:
>> Hello Tomas,
>>
>> 06.06.2023 12:56, Tomas Vondra wrote:
>>> On 6/6/23 11:00, Alexander Lakhin wrote:
>>>> Hello,
>>>> ...> With the debug logging added inside AssertTXNLsnOrder() I see:
>>>> ctx->snapshot_builder->start_decoding_at: 209807224,
>>>> ctx->reader->EndRecPtr: 210043072,
>>>> SnapBuildXactNeedsSkip(ctx->snapshot_builder, ctx->reader->EndRecPtr): 0
>>>> and inside the loop:
>>>> cur_txn->first_lsn: 209792872
>>>> cur_txn->first_lsn: 209975744
>>>> cur_txn->first_lsn: 210043008
>>>> cur_txn->first_lsn: 210043008
>>>> and it triggers the Assert.
>>>>
>>> So what's the prev_first_lsn value for these first_lsn values? How does
>>> it change over time? Did you try looking at the pg_waldump for these
>>> positions?
>>
>> With more logging I've got (for another run):
>> ReorderBufferTXNByXid| xid: 3397, lsn: c1fbc80
>>
>> ctx->snapshot_builder->start_decoding_at: c1f2cc0,
>> ctx->reader->EndRecPtr: c1fbcc0,
>> SnapBuildXactNeedsSkip(ctx->snapshot_builder, ctx->reader->EndRecPtr): 0
>> prev_first_lsn: 0, cur_txn->first_lsn: c1fbc80
>> prev_first_lsn: c1fbc80, cur_txn->first_lsn: c1fbc80
>> TRAP: failed Assert("prev_first_lsn < cur_txn->first_lsn") ...
>>
>> waldump for 00000001000000000000000C shows:
>> grep c1fbc80:
>> rmgr: Heap2       len (rec/tot):     60/    60, tx:       3398, lsn:
>> 0/0C1FBC80, prev 0/0C1FBC50, desc: NEW_CID rel: 1663/18763/19987, tid:
>> 0/1, cmin: 1, cmax: 4294967295, combo: 4294967295
>> rmgr: Heap        len (rec/tot):     59/    59, tx:       3398, lsn:
>> 0/0C1FBCC0, prev 0/0C1FBC80, desc: INSERT+INIT off: 1, flags: 0x08,
>> blkref #0: rel 1663/18763/19987 blk 0
>>
>> grep '( 3397| 3398)'
> 
> I've been able to reproduce this, after messing with the script a little
> bit (I had to skip the test_decoding regression tests, because that was
> complaining about slots already existing etc).
> 
> Anyway, AssertTXNLsnOrder sees these two transactions (before aborting):
> 
>   26662 0/6462E6F0 (first 0/0)
>   26661 0/6462E6F0 (first 0/6462E6F0)
> 
> 
> where 26661 is the top xact, 26662 is a subxact of 26661. This is
> clearly a problem, because we really should not have subxact in this
> list once the assignment gets applied.
> 
> And the relevant WAL looks like this:
> 
> ---------------------------------------------------------------------
> 26662, lsn: 0/645EDAA0, prev 0/645EDA60, desc: ASSIGNMENT xtop 26661:
> subxacts: 26662
> 26662, lsn: 0/645EDAD0, prev 0/645EDAA0, desc: INSERT+INIT off: 1,
> flags: 0x08, blkref #0: rel 1663/125447/126835 blk 0
> ...
>     0, lsn: 0/6462E5D8, prev 0/6462E2A0, desc: RUNNING_XACTS nextXid
> 26673 latestCompletedXid 26672 oldestRunningXid 26661; 3 xacts: 26667
> 26661 26664; 3 subxacts: 26668 26662 26665
> ...
> 26662, lsn: 0/6462E6F0, prev 0/6462E678, desc: NEW_CID rel:
> 1663/125447/126841, tid: 0/1, cmin: 1, cmax: 4294967295, combo: 4294967295
> 26662, lsn: 0/6462E730, prev 0/6462E6F0, desc: INSERT+INIT off: 1,
> flags: 0x08, blkref #0: rel 1663/125447/126841 blk 0
> 26661, lsn: 0/6462E770, prev 0/6462E730, desc: COMMIT 2023-06-06
> 16:41:24.442870 CEST; subxacts: 26662
> ---------------------------------------------------------------------
> 
> so the assignment is the *first* thing that happens for these xacts.
> 
> However, we skip the assignment, because the log for this call of
> get_changes says this:
> 
>    LOG:  logical decoding found consistent point at 0/6462E5D8
> 
> so we fail to realize the 26662 is a subxact.
> 
> Then when processing the NEW_CID, SnapBuildProcessNewCid chimes in and
> does this:
> 
>   ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
> 
>   ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
>                  xlrec->target_locator, xlrec->target_tid,
>                  xlrec->cmin, xlrec->cmax,
>                  xlrec->combocid);
> 
> and ReorderBufferAddNewTupleCids() proceeds to enter an entry for the
> passed XID (which is xlrec->top_xid, 26661), but with LSN of the WAL
> record. But ReorderBufferXidSetCatalogChanges() already did the same
> thing for the subxact 26662, as it has no idea it's a subxact (due to
> the skipped assignment).
> 
> I haven't figured out what exactly is happening / what it should be
> doing instead. But it seems wrong to skip the assignment - I wonder if
> SnapBuildProcessRunningXacts might be doing that too eagerly.
> 

In investigated this a bit more, and the problem actually seems to be
more like this:

1) we create a new logical replication slot

2) while building the initial snapshot, we start with current insert
location, and then process records

3) for RUNNING_XACTS we call SnapBuildProcessRunningXacts, which calls
SnapBuildFindSnapshot

4) SnapBuildFindSnapshot does this:

    else if (!builder->building_full_snapshot &&
             SnapBuildRestore(builder, lsn))
    {
        /* there won't be any state to cleanup */
        return false;
    }

5) because create_logical_replication_slot and get_changes both call
CreateInitDecodingContext with needs_full_snapshot=false, we end up
calling SnapBuildRestore()

6) once in a while this likely hits a snapshot created by a concurrent
session (for another logical slot) with SNAPBUILD_CONSISTENT state


I don't know what's the correct fix for this. Maybe we should set
needs_full_snapshot=true in create_logical_replication_slot when
creating the initial snapshot? Maybe we should use true even in
pg_logical_slot_get_changes_guts? This seems to fix the crashes ...

That'll prevent reading the serialized snapshots like this, but how
could that ever work? It seems pretty much guaranteed to ignore any
assignments that happened right before the snapshot?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: doc patch: note AttributeRelationId passed to FDW validator function
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH