RE: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Fix slot synchronization with two_phase decoding enabled |
Date | |
Msg-id | OS3PR01MB571846125A89761A47389CA594AE2@OS3PR01MB5718.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Fix slot synchronization with two_phase decoding enabled (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote: > > On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote: > > > > > > > > > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu) > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > Thank you for the explanation! I agree that the issue happens in these > cases. Just to share the steps to reproduce the issue when creating the subscription With (create_slot=true, failover = true, two_phase=true, copy_data=false) for reference. - pub: begin a txn A. This is to stop the slot from building a consistent snapshot immediately. - sub: create subscription with (slot_name='sub', create_slot=true, failover = true, two_phase=true, copy_data=false); - pub: Attach to the walsender that will create the slot and add a checkpoint in SnapBuildFindSnapshot() -> (builder->state = SNAPBUILD_FULL_SNAPSHOT;). For now, the state should be BUILDING_SNAPSHOT. - pub: begin another txn B and commit txn A. The snapshot should reach FULL_SNAPSHOT now. - pub: prepared a txn C and then commit txn B. - pub: release the checkpoint in SnapBuildFindSnapshot(), then it should reach the SNAPBUILD_CONSISTENT. Now we have a prepared txn whose prepare_lsn is less than the two_phase_at. - stop the primary and promote the standby. - sub: alter the subscription to use the new primary as the publisher. - commit the prepared transaction on new primary, the following error will be reported on subscriber: LOG: logical replication apply worker for subscription "sub" has started ERROR: prepared transaction with identifier "pg_gid_16398_764" does not exist. > > > > > > > > As another idea, I wonder if we could somehow defer to make the > > > > synced slot as 'sync-ready' until we can ensure that the slot > > > > doesn't have any transactions that are prepared before the point > > > > of enabling two_phase. For example, when the slotsync worker > > > > fetches the remote slot, it remembers the confirmed_flush_lsn (say > > > > LSN-1) if the local slot's two_phase becomes true or the local > > > > slot is newly created with enabling two_phase, and then it makes > > > > the slot 'sync-ready' once it confirmed that the slot's restart_lsn passed > LSN-1. Does it work? > > > > > > Thanks for the idea! > > > > > > We considered a similar approach in [1] to confirm there is no > > > prepared transactions before two_phase_at, but the issue is that > > > when the two_phase flag is switched from 'false' to 'true' (as in > > > the case with (copy_data=true, failover=true, two_phase=true)). In > > > this case, the slot may have already been marked as sync-ready > > > before the two_phase flag is enabled, as slotsync is unaware of potential > future changes to the two_phase flag. > > > > This can happen because when copy_data is true, tablesync can take a > > long time to complete the sync and in the meantime, slot without a > > two_phase flag would have been synced to standby. Such a slot would be > > marked as sync-ready even if we follow the calculation proposed by > > Sawada-san. Note that we enable two_phase once all the tables are in > > ready state (See run_apply_worker() and comments atop worker.c > > (TWO_PHASE TRANSACTIONS)). > > Right. It doesn't make sense to make the slot not-sync-ready and then back to > sync-ready. > > While I agree with the approach for HEAD and it seems difficult to find a > solution, I'm concerned that disallowing to use both failover and two_phase in > a minor release would affect users much. Users who are already using that > combination might end up needing to re-think their system architecture. So I'm > trying to narrow down use cases where we're going to prohibit or to find > workarounds. > > If we agree with the fix for HEAD, we can push the fix for HEAD first, which > would be better to be done sooner as it needs to bump the catversion. We can > discuss the ideas and workarounds for v17 later. Agreed. I will think more on it. One workaround could be skipping the prepared transaction when such an issue arises, followed by manually replicating the skipped changes. Best Regards, Hou zj
pgsql-hackers by date: