Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Fix slot synchronization with two_phase decoding enabled |
Date | |
Msg-id | CAD21AoCApou-Opjne+T5+1Czw65_Ck18fnGkyyt-9Ao6yeJF5w@mail.gmail.com Whole thread Raw |
In response to | RE: Fix slot synchronization with two_phase decoding enabled ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Fix slot synchronization with two_phase decoding enabled
|
List | pgsql-hackers |
On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wed, Apr 2, 2025 at 3:45 PM Masahiko Sawada wrote: > > Hi, > > > > > On Mon, Mar 31, 2025 at 4:3 AM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Thu, Mar 27, 2025 at 2:29 PM Amit Kapila wrote: > > > > > > > > > > > On Tue, Mar 25, 2025 at 12:1 PM Amit Kapila > > > > <amit.kapila16@gmail.com> > > > > wrote: > > > > > > > > > > On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu) > > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > When testing the slot synchronization with logical replication > > > > > > slots that enabled two_phase decoding, I found that transactions > > > > > > prepared before > > > > two-phase > > > > > > decoding is enabled may fail to replicate to the subscriber > > > > > > after being committed on a promoted standby following a failover. > > > > > > > > > > > > To reproduce this issue, please follow these steps (also > > > > > > detailed in the attached TAP test, v1-0001): > > > > > > > > > > > > 1. sub: create a subscription with (two_phase = false) 2. > > > > > > primary (pub): prepare a txn A. > > > > > > 3. sub: alter subscription set (two_phase = true) and wait for > > > > > > the logical > > > > slot to > > > > > > be synced to standby. > > > > > > 4. primary (pub): stop primary, promote the standby and let the > > > > > > subscriber > > > > use > > > > > > the promoted standby as publisher. > > > > > > 5. promoted standby (pub): COMMIT PREPARED A; 6. sub: the apply > > > > > > worker will report the following ERROR because it didn't > > > > > > receive the PREPARE. > > > > > > ERROR: prepared transaction with identifier "pg_gid_16387_752" > > > > does not exist > > > > > > > > > > > > I think the root cause of this issue is that the two_phase_at > > > > > > field of the slot, which indicates the LSN from which two-phase > > > > > > decoding is enabled > > > > (used to > > > > > > prevent duplicate data transmission for prepared transactions), > > > > > > is not synchronized to the standby server. > > > > > > > > > > > > In step 3, transaction A is not immediately replicated because > > > > > > it occurred before enabling two-phase decoding. Thus, the > > > > > > prepared transaction > > > > should only > > > > > > be replicated after decoding the final COMMIT PREPARED, as > > > > > > referenced > > > > in > > > > > > ReorderBufferFinishPrepared(). However, due to the invalid > > > > > > two_phase_at > > > > on the > > > > > > standby, the prepared transaction fails to send at that time. > > > > > > > > > > > > This problem arises after the support for altering the two-phase > > > > > > option (1462aad). > > > > > > > > > > > > > > I suspect that this can happen in PG17 as well, but I need to think > > > > more about it to make a reproducible test case. > > > > > > After further analysis, I was able to reproduce the same issue [1] in > > > PG 17. > > > > > > [1] > > > - pub: created a slot 'sub' with two_phase=false, then prepared a > > > transaction > > > - pub: after some activity, advanced the confirmed_flush_lsn of 'sub', so it is > > > greater than prepared txn lsn. > > > - sub: create subscription with (slot_name='sub', create_slot=false, > > > failover = true, two_phase=true, copy_data=false); two_phase_at will > > > be set to the same as confirmed_flush_lsn which is greater than the > > prepared transaction. > > > - stop the primary and promote the standby. > > > - commit the prepared transaction on standby, the following error will be > > > reported on subscriber: > > > > It seems to require elaborate steps to reproduce this issue in v17. I wonder if we > > could somehow narrow down the cases that we want to prohibit. The patch for > > v17 disallows CREATE SUBSCRIPTION to enable both two_phase and failover, > > but I guess that it's still safe if it also creates the replication slot (e.g., > > create_slot is true). If my understanding is right, we can allow users to specify > > both fields if CRETE SUBSCRIPTION creates the slot, and we don't need to > > disallow that in ReplicationSlotCreate(). > > Thanks for reviewing the steps. > > The current reproducer aims for simplicity; however, I think it's possible to reproduce > the issue even with create_slot = true, although it requires the help of a > debugger and additional steps. But as long as there are transactions prepared > before the two_phase_at position, and they are skipped due to checks in > ReorderBufferFinishPrepared() (refer to comments[1] for why we skip sending > prepared transaction), the issue can be reproduced. > > For instance, when creating a subscription with (copy_data=true, failover=true, > two_phase=true), the slot's two_phase setting starts as false and shifts to > true after table sync (refer to [2] for related comments). During this period, > if a user prepares a transaction where the prepare LSN is less than the > two_phase_at, the same problem could happen. > > Similarly, when setting up a subscription with (copy_data=false, failover=true, > two_phase=true), although two_phase is initially set to true and we wait for > running transactions to finish when building consistent snapshot, a race > condition may still exist: If the snapshot state reaches FULL_SNAPSHOT, it > won't check running transactions further (see SnapBuildFindSnapshot() for > specifics), if a user prepares a transaction at this point, it's possible for > the prepare LSN to be less than the LSN marking the snapshot's consistent > state, causing the same issue. Thank you for the explanation! I agree that the issue happens in these cases. As another idea, I wonder if we could somehow defer to make the synced slot as 'sync-ready' until we can ensure that the slot doesn't have any transactions that are prepared before the point of enabling two_phase. For example, when the slotsync worker fetches the remote slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local slot's two_phase becomes true or the local slot is newly created with enabling two_phase, and then it makes the slot 'sync-ready' once it confirmed that the slot's restart_lsn passed LSN-1. Does it work? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: