Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load - Mailing list pgsql-bugs

From Shlok Kyal
Subject Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load
Date
Msg-id CANhcyEUs+_fgmd61jWiSvwxYz+-DGgL00q=C5ZdoYaj9D9baWw@mail.gmail.com
Whole thread Raw
In response to RE: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-bugs
On Tue, 22 Jul 2025 at 17:51, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok,
>
> > I checked it and here is my analysis:
> >
> > When we create a slot, it returns the confirmed_flush LSN as a
> > consistent_lsn. I noticed that in general when we create a slot, the
> > confirmed_flush is set to the end of a RUNNING_XACT log or we can say
> > start of the next record. And this next record can be anything. Ii can
> > be a COMMIT record for a transaction in another session.
> > I have attached server logs and waldump logs for one of such case
> > reproduced using test script shared in [1].
> > The snapbuild machinery has four steps: START, BUILDING_SNAPSHOT,
> > FULL_SNAPSHOT and SNAPBUILD_CONSISTENT. Between each step a
> > RUNNING_XACT is logged.
> ...
>
> Thanks for the analysis! It is quite helpful. Based on your point I understood
> like below. Are they correct?
>
> Facts:
> =====
> 1.
> RUNNING_XACT records can be generated when the snapshot status is advanced while
> creating the slot.
> 2.
> pg_create_logical_replication_slot() returns the end point of RUNNING_XACT.
> It was generated when the snapshot becomes SNAPBUILD_CONSISTENT.
> 3.
> Some transactions could be started while the snapshot is FULL_SNAPSHOT state, and
> they can be committed after we reached SNAPBUILD_CONSISTENT. Such transactions
> should be output by the upcoming logical decoding.
>
> What happened here:
> =================
> a.
> confirmed_flush_lsn was 0/03CBCCA0, which is end of RUNNING_XACT (lsn: 0/03CBCC58).
> Also, a COMMIT record for txn 1369 located *just after* the RUNNING_XACT [1].
> b.
> pg_createsubscriber set the recovery_target_lsn to "0/03CBCCA0", and
> recovery_target_inclusive was true. This meant record stared from "0/03CBCCA0"
> must be applied.
> c.
> startup process applied till that point. Transaction 1369 was applied and then the
> standby could be promoted.
> e.
> logical walsender decoded transaction 1369 and replicated it to the standby.
> However, it has already been applied by startup thus conflict could happen.
>
> [1]:
> according to the log:
> ```
> ...
> rmgr: Standby     len (rec/tot):     70/    70, tx:          0, lsn: 0/03CBCC58, prev 0/03CBCC18, desc: RUNNING_XACTS
nextXid1370 latestCompletedXid 1364 oldestRunningXid 1365; 5 xacts: 1366 1365 1369 1368 1367
 
> rmgr: Transaction len (rec/tot):     46/    46, tx:       1369, lsn: 0/03CBCCA0, prev 0/03CBCC58, desc: COMMIT
2025-07-2016:50:18.031146 IST
 
> ...
> ```
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
Hi Kuroda-san,

Thanks for reviewing the thread. Your understanding is correct.

Thanks,
Shlok Kyal



pgsql-bugs by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe
Next
From: Amit Kapila
Date:
Subject: Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load