Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load - Mailing list pgsql-bugs
From | vignesh C |
---|---|
Subject | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load |
Date | |
Msg-id | CALDaNm1C=q=QCzbgMe2WNJd_JOOpYi4SvMF_PqPFMbRoNH8=eA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
List | pgsql-bugs |
On Tue, 29 Apr 2025 at 13:17, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > On Mon, 28 Apr 2025 at 10:28, vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, 24 Apr 2025 at 11:57, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > On Thu, 24 Apr 2025 at 09:08, vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > On Wed, 23 Apr 2025 at 07:29, Euler Taveira <euler@eulerto.com> wrote: > > > > > > > > > > On Wed, Apr 16, 2025, at 8:14 PM, PG Bug reporting form wrote: > > > > > > > > > > I'm in the process of converting our databases from pglogical logical > > > > > replication to the native logical replication implementation on PostgreSQL > > > > > 17. One of the bugs we encountered and had to work around with pglogical was > > > > > the plugin dropping records while converting to a streaming replica to > > > > > logical via pglogical_create_subscriber (reported > > > > > https://github.com/2ndQuadrant/pglogical/issues/349). I was trying to > > > > > confirm that the native logical replication implementation did not have this > > > > > problem, and I've found that it might have a different problem. > > > > > > > > > > > > > > > pg_createsubscriber uses a different approach than pglogical. While pglogical > > > > > uses a restore point, pg_createsubscriber uses the LSN from the latest > > > > > replication slot as a replication start point. The restore point approach is > > > > > usually suitable to physical replication but might not cover all scenarios for > > > > > logical replication (such as when there are in progress transactions). Since > > > > > creating a logical replication slot does find a consistent decoding start > > > > > point, it is a natural choice to start the logical replication (that also needs > > > > > to find a decoding start point). > > > > > > > > I observed a difference between logical replication and > > > > pg_createsubscriber in how the replication origin is set, which can > > > > lead to different behaviors during replication restarts. > > > > > > > > Consider the following WAL records: > > > > rmgr: Transaction len (rec/tot): 46/ 46, tx: 767, lsn: > > > > 0/01756868, prev 0/01756780, desc: COMMIT 2025-04-23 15:49:24.349942 > > > > IST > > > > rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn: > > > > 0/01756898, prev 0/01756868, desc: RUNNING_XACTS nextXid 768 > > > > latestCompletedXid 767 oldestRunningXid 755; 1 xacts: 755 > > > > > > > > Behavior in Logical Replication > > > > In logical replication, after applying the COMMIT record at LSN > > > > 0/01756868, the replication origin is set to the end LSN of the commit > > > > record, i.e., 0/01756898. On restart, replication resumes from the > > > > next LSN after COMMIT records, which is 0/01756898 in this case. > > > > As a result, the same transaction is not sent again, and duplicate > > > > data is avoided. > > > > > > > > Behavior in pg_createsubscriber > > > > However, in the case of pg_createsubscriber, the consistent point used > > > > for creating the replication slot on the publisher may be set exactly > > > > at the commit LSN (0/01756868, xid 767). When promoting the standby, > > > > this same LSN is used as recovery_target_lsn, so the standby recovers > > > > up to and including the commit of transaction 767. > > > > > > > > After promotion, if the replication origin is also set to this same > > > > commit LSN, the subscriber will request changes starting from that > > > > point. Since the origin doesn't reflect the commit as applied, > > > > transaction 767 gets replicated again, leading to duplicate data and > > > > possible replication failure. > > > > > > > > If the issue is not reproducible using Zane's suggested steps, we can > > > > try the following: > > > > 1) Stop the standby server. > > > > 2) Perform an insert transaction and note the commit LSN using pg_waldump. > > > > 3) Set up the publisher, replication slot, etc., simulating the > > > > pg_createsubscriber behavior. > > > > 4) Restart the standby with promotion configuration from > > > > setup_recovery(), setting recovery_target_lsn to the commit LSN from > > > > step 2. > > > > 5) Create the subscription and set the replication origin to the same > > > > commit LSN. > > > > 6) Enable the subscription. > > > > > > > > This setup should reproduce the issue where the transaction gets > > > > applied twice on the subscriber due to the replication origin being > > > > aligned with the commit LSN rather than its end. > > > > > > > > Thoughts? > > > > > > Hi Vignesh, > > > > > > I have verified the behaviour shared by you for Logical Replication > > > and pg_createsubscriber and agree with your analysis. > > > I have also tried the steps shared by you and am able to reproduce the issue. > > > > > > I am thinking of resolving it by introducing a new API, which can give > > > us the next lsn to the lsn provided. > > > During pg_createsusbcriber run where we are advancing the replication > > > origin to 'consistent_lsn'. I think we should advance it to the next > > > lsn of 'consistent_lsn' instead. > > > I think this will resolve the issue. Thoughts? > > > > With this approach, there is a risk of starting from the next WAL > > record after the consistent point. For example, if the slot returns a > > consistent point at 0/1715E10, after the fix we would begin replaying > > from the next WAL record, such as 0/1715E40, which could potentially > > lead to data loss. > > As an alternative, we could set recovery_target_inclusive to false in > > the setup_recovery function. This way, recovery would stop just before > > the recovery target, allowing the publisher to start replicating > > exactly from the consistent point. > > Thoughts? > > This approach looks better to me. > I have prepared the patch for the same. > > With the patch I have run the test script in [1]. It ran for ~7hrs and > it did not reproduce the issue. Can you check and see that the original scenario gets verified, that is the consistent lsn record type is COMMIT. You can use the patch attached which will print the consistent lsn record type. Make sure to create the pg_walinspect extension before running pg_createsubscriber as the pg_walinspect's pg_get_wal_record_info function is used. Regards, Vignesh
Attachment
pgsql-bugs by date: