Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Fix slot synchronization with two_phase decoding enabled
Date
Msg-id CAD21AoAOkfLCiSamfBYVtFiXXLr3LVCdpBjvm6ssYA2O_ZVF=w@mail.gmail.com
Whole thread Raw
In response to Re: Fix slot synchronization with two_phase decoding enabled  (Nisha Moond <nisha.moond412@gmail.com>)
List pgsql-hackers
On Fri, Jun 6, 2025 at 12:07 AM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Thu, Jun 5, 2025 at 3:26 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Jun 5, 2025 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Jun 3, 2025 at 11:05 AM Nisha Moond <nisha.moond412@gmail.com> wrote:
> > > >
> > > >
> > > > Attached v17 patches. Added a top-up patch 0002 implementing the idea
> > > > suggested by Amit above.
> > >
> > > I have started reviewing this, although I haven't done a complete
> > > review yet, but I have a question on the fix we are trying to do, IIUC
> > > we are disallowing to use 'two phase' and 'failover' options together
> > > at the create slot time and now users has to create slot with one of
> > > the option and later enable other option right (if user want to use
> > > both options)? But don't you think it will affect usability? because
> > > if a user wants to use both the options together then after creating
> > > the slot they need to track when is the right time to enable the other
> > > option?  Not sure if anyone else has this concern or it's just me?
> > >
> > Some additional comments while quickly glancing at the patch
> >
>
> Thank you for the review.
>
> > 1.
> > + if (two_phase && !IsSyncingReplicationSlots() && !IsBinaryUpgrade)
> > + ereport(ERROR,
> > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot enable both \"%s\" and \"%s\" options during
> > replication slot creation",
> > +    "failover", "two_phase"));
> >
> > I think we should also give hints to retry later when a certain
> > constraint is met?
> >
>
> Since it’s not possible to create a slot with both options enabled,
> this command will always fail.
>
> As there are no specific constraints to fulfill that would allow
> creating a slot with both options enabled, the possible hints are:
>  - Suggesting to use ALTER_REPLICATION_SLOT command to alter failover
> later. But it may not be appropriate here from the user's perspective.
>  - Recommending to use ALTER SUBSCRIPTION to enable failover later.
> But not all slots are created for subscriptions, so this may not apply
> in every case.
>
> Please let me know if you have any suggestions for a suitable hint we
> could provide here.
>
> > Also this is hardcoded options "failover" and "two_phase" so why do we
> > need to use %s for contruncting this error message?
> >
>
> I’ve followed the error message guidelines to keep the messages
> translator-friendly. The use of %s helps isolate keywords like GUCs or
> sub-options so they remain untouched during translation.
>
> That said, it’s not a strict rule, usage often depends on developer or
> committer judgment. After revisiting the docs and re-evaluating this
> patch related files, I’ve adjusted the messages with the following in
> mind:
>  - do not use "%s" when terms "failover" and "two-phase" can be used
> as concepts and not subscription options.
>  - use "%s" when referring explicitly to options (or nearby code is so).
>  - wrap SQL commands in double quotes for clarity.
>  - since slot.c doesn’t use "%s" in similar contexts, I’ve updated the
> message style there for consistency.
>
> Please let me know if you feel any of the messages could still be
> improved further.
>
> > 2.
> > +    "failover", "two_phase"),
> > + errhint("Use ALTER SUBSCRIPTION ... SET (failover = true) to enable
> > \"%s\" after two_phase state is ready",
> > + "failover"));
> >
> > Here we are using a mix of hardcoded string and formatted string, like
> > for (failover = true) we hardcoded the "failover" whereas to enable
> > \"%s\", we have
> > used %s.  Better to just directly use failover as we are not depending
> > on any variable.  Please look at other places as well, I see a few
> > more places whereas
> > we have used like this.
> >
>
> Modified this message following the reasoning outlined above.
>
> > 3. + if (slot->data.failover)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
> > + NameStr(slot->data.name))));
> >
> > So for a failover slot we can never enable two_phase, whereas for
> > two_phase enabled slot we can enable failover? This seems confusing,
> > no?
> >
>
> This restriction is not introduced by this patch.
> As Amit pointed out [1], this patch is specific to PG17, where
> altering the two_phase option for a subscription is not permitted.
> This capability has been added in PG18 onwards.
> ~~~
>
> Attached v18 patch.
>  - patch-001: modified error messages as suggested above.
>  - patch-002: improved pg_dump docs as per Shveta's off-list suggestions.

Thank you for updating the patch.

BTW have we addressed the point Amit mentioned before[1]?

> The one more combination to consider is when someone takes a dump of
> an older version and loads it into a newer version. For example, where
> users dump from 17.5 and then restore in a newer version, say 17.6
> (which has our fix), the restore will fail due to newer restrictions
> added by this patch. Do we need to do anything about it?

I think it could be a significant side-effect and we need to do
something about that.

Regards,

[1] https://www.postgresql.org/message-id/CAA4eK1KiGU5Zr_S2g8w-8dUm-PRiZc72BgMnLdS83u7sgnctcg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support
Next
From: Melanie Plageman
Date:
Subject: Re: PG 18 release notes draft committed