Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix slot synchronization with two_phase decoding enabled
Date
Msg-id CAA4eK1LvMwXxvAzHpK+Egjc7vu1NmGxxKcaK_06pE7GKk7JtJQ@mail.gmail.com
Whole thread Raw
In response to RE: Fix slot synchronization with two_phase decoding enabled  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Fix slot synchronization with two_phase decoding enabled
List pgsql-hackers
On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> >
> > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > >
> > > ----------
> > > Approach 2
> > > ----------
> > >
> > > Instead of disallowing the use of two-phase and failover together, a more
> > > flexible strategy could be only restrict failover for slots with two-phase
> > > enabled when there's a possibility of existing prepared transactions before
> > the
> > > two_phase_at that are not yet replicated. During slot creation with
> > two-phase
> > > and failover, we could check for any decoded prepared transactions when
> > > determining the decoding start point (DecodingContextFindStartpoint). For
> > > subsequent attempts to alter failover to true, we ensure that two_phase_at is
> > > less than restart_lsn, indicating that all prepared transactions have been
> > > committed and replicated, thus the bug would not happen.
> > >
> > > pros:
> > >
> > > This method minimizes restrictions for users. Especially during slot creation
> > > with (two_phase=on, failover=on), as it’s uncommon for transactions to
> > prepare
> > > during consistent snapshot creation, the restriction becomes almost
> > > unnoticeable.
> >
> > I think this approach can work for the transactions that are prepared
> > while the slot is created. But if I understand the problem correctly,
> > while the initial table sync is performing, the slot's two_phase is
> > still false, so we need to deal with the transactions that are
> > prepared during the initial table sync too. What do you think?
> >
>
> Yes, I agree that we need to restrict this case too. Given that we haven't
> started decoding when setting two_phase=true during CreateDecodingContext()
> after tablesync, we could check prepared transactions afterwards during
> decoding. This could involve reporting an ERROR when skipping a prepared
> transaction during decoding if its prepare LSN is less than two_phase_at.
>

It will make it difficult for users to detect it as this happens at a
later point of time.

> Alternatively, a simpler method would be to prevent this situation entirely
> during the CREATE SUBSCRIPTION command. For example, we could restrict slots
> created with failover set to true and twophase is later modified to true after
> tablesync. Although the simpler check is more user-visible, it may offer less
> flexibility.
>

I agree with your point, but OTOH, I am also afraid of adding too many
smart checks in the back-branch. If we follow what you say here, then
users have the following ways in PG17 to enable both failover and
two_phase. (a) During Create Subscription, users can set both
'failover' and 'two_phase', if 'copy_data' is false, or (b), if
'copy_data' is true, during Create Subscription, then users can enable
'two_phase' and wait for it to be enabled. Then use Alter Subscription
to set 'failover'.


--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER