On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I realized that users who create a logical slot using
> > pg_create_logical_replication_slot() would not be able to enable both
> > options at slot creation, and there is no easy way to enable the
> > failover after two_phase-enabled-slot creation. Users would need to
> > use ALTER_REPLICATION_SLOT replication command, which seems
> > unrealistics for users to use. On the other hand, if we allow creating
> > a logical slot with enabling failover and two_phase using SQL API,
> > there is still a chance for this bug to occur. Would it be worth
> > considering that if a logical slot is created with enabling failover
> > and two_phase using SQL API, we create the slot with only
> > two_phase=true, then advance the slot until the slot satisfies
> > restart_lsn >= two_phase_at, and then enable the failover?
> >
>
> This means we either need to maintain somewhere that user has provided
> failover flag till restart_lsn >= two_phase_at or and then set
> failover flag in the slot
I was thinking of this idea.
> or initially mark it but enable the
> functionality of failover when we reach the condition restart_lsn >=
> two_phase_at.
IIUC the slot could be synchronized to the standby as soon as we
complete DecodingContextFindStartpoint() for a failover-enabled slot.
So we would need some mechanisms to make sure that the slot is not
synchronized while we're waiting to reach the condition restart_lsn >=
two_phase_at even if the failover is enabled.
> Both seem to have different kinds of problems. The first
> idea seems to have an issue with persistence, which means we can lose
> track of the flag after the restart.
I think we can do this series of operations while the slot is not
persistent, that is the slot is still RS_EPHEMERAL.
> The second can mislead the user
> for a long period in cases where prepare and commit have a large time
> gap. I feel this will introduce complexity either in the form of code
> or in giving the information to the user.
Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com