Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load |
Date | |
Msg-id | CAA4eK1LCOaZBb5XtKqcPi0v6kJrKMZ-P8wpOCQxwy=cBTogurQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
List | pgsql-bugs |
On Tue, Jul 22, 2025 at 4:54 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > On Thu, 10 Jul 2025 at 12:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > We should find out in which case and why the consisten_lsn is a start > > point LSN of a commit record. We use slot's confirm_flush LSN location > > as a consistent_lsn, which normally should be the end point of > > running_xacts record or commit_end LSN record (in case client sends > > ack). > > > I checked it and here is my analysis: > > When we create a slot, it returns the confirmed_flush LSN as a > consistent_lsn. I noticed that in general when we create a slot, the > confirmed_flush is set to the end of a RUNNING_XACT log or we can say > start of the next record. And this next record can be anything. Ii can > be a COMMIT record for a transaction in another session. > ... > waldump record: > rmgr: Standby len (rec/tot): 70/ 70, tx: 0, lsn: > 0/03CBCC58, prev 0/03CBCC18, desc: RUNNING_XACTS nextXid 1370 > latestCompletedXid 1364 oldestRunningXid 1365; 5 xacts: 1366 1365 1369 > 1368 1367 > The consistent point is found at "0/3CBCC58". > > When slot is created the confirmed_flush is set inside function > "DecodingContextFindStartpoint" using: > slot->data.confirmed_flush = ctx->reader->EndRecPtr; > In our case the value of consistent_lsn is "0/03CBCCA0" (I added some > logs and got the value). Logs: > 2025-07-20 16:50:18.039 IST [1780326] port=5340 > ubuntu@test_db/[unknown] LOG: #### confirmed_flush = 0/03CBCCA0 > inside DecodingContextFindStartpoint > 2025-07-20 16:50:18.039 IST [1780326] port=5340 > ubuntu@test_db/[unknown] STATEMENT: SELECT lsn FROM > pg_catalog.pg_create_logical_replication_slot('sub', 'pgoutput', > false, false, false) > > This consistent_lsn "0/03CBCCA0" is nothing but End of RUNNING_XACT ( > whose start is "0/3CBCC58"). > > While the slot is being created a transaction in a concurrent session > commits (just after the third RUNNING_XACT) and add a COMMIT log: > rmgr: Transaction len (rec/tot): 46/ 46, tx: 1369, lsn: > 0/03CBCCA0, prev 0/03CBCC58, desc: COMMIT 2025-07-20 16:50:18.031146 > IST > > So, in such cases the consistent LSN can be set to a COMMIT record. > Your analysis and proposed patch looks good to me. I'll push this patch tomorrow unless Euler or someone thinks otherwise. > > If we decide to fix in the way proposed here, then we also need to > > investigate whether we need an additional WAL record added by commit' > > 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc. The reason why that > > additional WAL record was added is discussed in email [1]. > > > > [1] - https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed > > I reverted the change added by commit > 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc and applied my patch and > checked the behaviour. And I am able to reproduce the issue the commit > was resolving. I think this change is still required. > This change is still required because, while recovery is performed in > the function 'PerformWalRecovery', when recovery_target_inclusive is > set to false, function 'recoveryStopsBefore' is responsible to set > whether recovery is finished or not. This function will set > 'reachedRecoveryTarget' to true when it satisfy the condition > /* Check if target LSN has been reached */ > if (recoveryTarget == RECOVERY_TARGET_LSN && > !recoveryTargetInclusive && > record->ReadRecPtr >= recoveryTargetLSN) > Here we are checking if "start of the record" >= recoveryTargetLSN. > > When a replication slot is created, consistent_lsn is obtained. Since > this consistent_lsn points to End of the record (or we can say start > of the next record), there can be a case that there is no WAL record > corresponding to the consistent lsn. So, during the recovery, it will > wait till it reads the record corresponding to consistent lsn (during > my testing this wait was around ~20 sec). And this wait can create the > timeout issue. > I have manually debugged and checked the above case and I think the > change in commit 03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc is still > needed. > Agreed. Thanks for the detailed analysis. -- With Regards, Amit Kapila.
pgsql-bugs by date: