On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
>
> >
> > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> >
> > Thank you for the explanation! I agree that the issue happens in these cases.
> >
> > As another idea, I wonder if we could somehow defer to make the synced
> > slot as 'sync-ready' until we can ensure that the slot doesn't have
> > any transactions that are prepared before the point of enabling
> > two_phase. For example, when the slotsync worker fetches the remote
> > slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local
> > slot's two_phase becomes true or the local slot is newly created with
> > enabling two_phase, and then it makes the slot 'sync-ready' once it
> > confirmed that the slot's restart_lsn passed LSN-1. Does it work?
>
> Thanks for the idea!
>
> We considered a similar approach in [1] to confirm there is no prepared
> transactions before two_phase_at, but the issue is that when the two_phase flag
> is switched from 'false' to 'true' (as in the case with (copy_data=true,
> failover=true, two_phase=true)). In this case, the slot may have already been
> marked as sync-ready before the two_phase flag is enabled, as slotsync is
> unaware of potential future changes to the two_phase flag.
>
This can happen because when copy_data is true, tablesync can take a
long time to complete the sync and in the meantime, slot without a
two_phase flag would have been synced to standby. Such a slot would be
marked as sync-ready even if we follow the calculation proposed by
Sawada-san. Note that we enable two_phase once all the tables are in
ready state (See run_apply_worker() and comments atop worker.c
(TWO_PHASE TRANSACTIONS)).
--
With Regards,
Amit Kapila.