Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix slot synchronization with two_phase decoding enabled
Date
Msg-id CAA4eK1LhrjqfFFQY4zc=UVd3Hazv4OMPg9F10NaWpko5hFanPw@mail.gmail.com
Whole thread Raw
In response to Re: Fix slot synchronization with two_phase decoding enabled  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I realized that users who create a logical slot using
> > > pg_create_logical_replication_slot() would not be able to enable both
> > > options at slot creation, and there is no easy way to enable the
> > > failover after two_phase-enabled-slot creation. Users would need to
> > > use ALTER_REPLICATION_SLOT replication command, which seems
> > > unrealistics for users to use. On the other hand, if we allow creating
> > > a logical slot with enabling failover and two_phase using SQL API,
> > > there is still a chance for this bug to occur. Would it be worth
> > > considering that if a logical slot is created with enabling failover
> > > and two_phase using SQL API, we create the slot with only
> > > two_phase=true, then advance the slot until the slot satisfies
> > > restart_lsn >= two_phase_at, and then enable the failover?
> > >
> >
> > This means we either need to maintain somewhere that user has provided
> > failover flag till restart_lsn >= two_phase_at or and then set
> > failover flag in the slot
>
> I was thinking of this idea.
>
> > or initially mark it but enable the
> > functionality of failover when we reach the condition restart_lsn >=
> > two_phase_at.
>
> IIUC the slot could be synchronized to the standby as soon as we
> complete DecodingContextFindStartpoint() for a failover-enabled slot.
> So we would need some mechanisms to make sure that the slot is not
> synchronized while we're waiting to reach the condition restart_lsn >=
> two_phase_at even if the failover is enabled.
>

So, then we need any state or persistent flag for this.

> > Both seem to have different kinds of problems. The first
> > idea seems to have an issue with persistence, which means we can lose
> > track of the flag after the restart.
>
> I think we can do this series of operations while the slot is not
> persistent, that is the slot is still RS_EPHEMERAL.
>

But we still need a persistent flag to indicate such slots shouldn't
be synced to standby till we reach the condition restart_lsn >=
two_phase_at.

> > The second can mislead the user
> > for a long period in cases where prepare and commit have a large time
> > gap. I feel this will introduce complexity either in the form of code
> > or in giving the information to the user.
>
> Agreed. Both ways introduce complexity so we need to consider the
> user-unfriendliness (by not having a proper way to enable failover for
> the two_phase-enabled-slot using SQL API) vs. risk (of introducing
> complexity).
>

Right, to me it sounds risky to provide such functionality for SQL API
in the back branch.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix premature xmin advancement during fast forward decoding
Next
From: Tom Lane
Date:
Subject: Re: Avoid circular header file dependency