RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639) - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date
Msg-id OS0PR01MB5716461EBCEBDEB29D7F54B6945AA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Wednesday, June 14, 2023 5:05 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 6/14/23 05:15, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, June 13, 2023 12:19 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >>
> >> On Tuesday, June 13, 2023 12:04 PM Amit Kapila
> >> <amit.kapila16@gmail.com>
> >> wrote:
> >>>
> >>> On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra
> >>> <tomas.vondra@enterprisedb.com> wrote:
> >>>>
> >>>>
> >>>> Well, I think the issue is pretty clear - we end up with an initial
> >>>> snapshot that's in between the ASSIGNMENT and NEW_CID, and because
> >>>> NEW_CID has both xact and subxact XID it fails because we add two
> >>>> TXNs with the same LSN, not realizing one of them is subxact.
> >>>>
> >>>> That's obviously wrong, although somewhat benign in production
> >>>> because it only fails because of hitting an assert.
> >>>>
> >>>
> >>> Doesn't this indicate that we can end up decoding a partial
> >>> transaction when we restore a snapshot? Won't that be a problem even
> >>> for
> >> production?
> >>
> >> Yes, I think it can cause the problem that only partial changes of a
> >> transaction are streamed.
> >> I tried to reproduce this and here are the steps. Note, to make sure
> >> the test won't be affected by other running_xact WALs, I changed
> >> LOG_SNAPSHOT_INTERVAL_MS to a bigger number.
> >>
> >> session 1:
> >> -----
> >> create table test(a int);
> >> SELECT 'init' FROM
> >> pg_create_logical_replication_slot('isolation_slot_1',
> >> 'test_decoding');
> >> -----
> >>
> >> session 2:
> >> -----
> >> - Start a transaction
> >> BEGIN;
> >> INSERT INTO test VALUES(1);
> >> -----
> >>
> >> session 3:
> >> -----
> >> - Create another slot isolation_slot_2, it should choose a
> >> restart_point which is
> >> - after the changes that happened in session 2. Note, to let the
> >> current slot
> >> - restore another snapshot, we need to use gdb to block the current
> >> backend at
> >> - SnapBuildFindSnapshot(), the backend should have logged the
> >> running_xacts WAL
> >> - before reaching SnapBuildFindSnapshot.
> >>
> >> SELECT 'init' FROM
> >> pg_create_logical_replication_slot('isolation_slot_2',
> >> 'test_decoding');
> >> -----
> >>
> >> session 1:
> >> -----
> >> - Since there is a running_xacts which session 3 logged, the current
> >> backend will
> >> - serialize the snapshot when decoding the running_xacts WAL, and the
> >> snapshot
> >> - can be used by other slots (e.g. isolation_slot_2)
> >>
> >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >> -----
> >>
> >> session 2:
> >> -----
> >> - Insert some different data and commit the transaction.
> >>
> >> INSERT INTO test VALUES(2);
> >> INSERT INTO test VALUES(3);
> >> INSERT INTO test VALUES(4);
> >> COMMIT
> >> -----
> >>
> >> session 3:
> >> -----
> >> - Release the process and try to stream the changes, since the
> >> restart point is
> >> - at the middle of the transaction, it will stream partial changes of
> >> the
> >> - transaction which was committed in session 2:
> >>
> >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >> -----
> >>
> >> Results (partial streamed changes):
> >> postgres=# SELECT data FROM
> >> pg_logical_slot_get_changes('isolation_slot_2',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >>                   data
> >> -----------------------------------------
> >>  BEGIN
> >>  table public.test: INSERT: a[integer]:2  table public.test: INSERT:
> >> a[integer]:3 table public.test: INSERT: a[integer]:4  COMMIT
> >> (5 rows)
> >>
> >
> > One idea to fix the partial change stream problem would be that we
> > record all the running transaction's xid when restoring the snapshot
> > in SnapBuildFindSnapshot(), and in the following decoding, we skip
> > decoding changes for the recorded transaction. Or we can do similar to
> > 7f13ac8(serialize the information of running xacts if any)
> >
> 
> We need to think about how to fix this in backbranches, and the idea with
> serializing running transactions seems rather unbackpatchable (as it changes
> on-disk state).
> 
> > But one point I am not very sure is that we might retore snapshot in
> > SnapBuildSerializationPoint() as well where we don't have running
> > transactions information. Although SnapBuildSerializationPoint() is
> > invoked for XLOG_END_OF_RECOVERY and XLOG_CHECKPOINT_SHUTDOWN
> records
> > which seems no running transaction will be there when logging. But I
> > am not 100% sure if the problem can happen in this case as well.
> >
...
> 
> Also, can you try if we still stream the partial transaction with
> create_logical_replication_slot building a full snapshot?

Yes, It can fix this problem because force create_logical_replication_slot
build a full snapshot can avoid restoring the snapshot. But I am not sure if
this is the best fix for this for the same reason(it's costly) mentioned by
Amit[1].

[1] https://www.postgresql.org/message-id/CAA4eK1Kro0rej%3DZXMhcdjs%2BaYsZvNywu3-cqdRUtyAp4zqpVWw%40mail.gmail.com

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [DOC] Update ALTER SUBSCRIPTION documentation v3
Next
From: Japin Li
Date:
Subject: Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c