Thread: Re: Using failover slots for PG-non_PG logical replication

Re: Using failover slots for PG-non_PG logical replication

From
shveta malik
Date:
On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi All,
>
> The failover slots documentation [1] is good for PG - PG logical
> replication, but the first two queries require pg_subscription which
> may not be present in non-PG downstream. Somebody looking to setup
> failover slots for non-PG subscriber may not find the page useful.

Okay.  It appears to me that the entire document at [1] is
specifically intended for a built-in replication setup, and the
corresponding page was written with that context in mind.

> However, the third query, when modified to mention the replication
> slots relevant to the downstream is useful to them. How to find the
> replication slots to be synchronized is a problem specific to the type
> of downstream. Such a setup should add those slots to
> sync_replication_slots. I think the chapter should mention that the
> 3rd query should also include the slots mentioned in
> sync_replication_slots for PG-non_PG logical replication setup.
>

sync_replication_slots is a boolean which enables a physical standby
to synchronize logical failover slots. Did you mean something else?

We could include a general note in the documentation something like:
'For non-native replication, it is the responsibility of the plugin
author to define the required slots'. Then, we can directly use the
third query. Thoughts?

[1]: https://www.postgresql.org/docs/current/logical-replication.html

thanks
Shveta



Re: Using failover slots for PG-non_PG logical replication

From
Amit Kapila
Date:
On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > The failover slots documentation [1] is good for PG - PG logical
> > > replication, but the first two queries require pg_subscription which
> > > may not be present in non-PG downstream. Somebody looking to setup
> > > failover slots for non-PG subscriber may not find the page useful.
> >
> > Okay.  It appears to me that the entire document at [1] is
> > specifically intended for a built-in replication setup, and the
> > corresponding page was written with that context in mind.
> >
> > > However, the third query, when modified to mention the replication
> > > slots relevant to the downstream is useful to them. How to find the
> > > replication slots to be synchronized is a problem specific to the type
> > > of downstream. Such a setup should add those slots to
> > > sync_replication_slots. I think the chapter should mention that the
> > > 3rd query should also include the slots mentioned in
> > > sync_replication_slots for PG-non_PG logical replication setup.
> > >
> >
> > sync_replication_slots is a boolean which enables a physical standby
> > to synchronize logical failover slots. Did you mean something else?
>
> I confused this with the actual list of slots to be synchronized.
> Sorry for that. The slots to be synchronized can be obtained from the
> primary by querying pg_replication_slots with failover = true.
>

Note that primary may have slots corresponding to multiple subscriber
nodes, so querying all slots on primary will give a correct answer may
depend on the use case. For example, say a user wants to do some sort
of load balancing such that some of the subscriber/downstream nodes
are served by a standby, then directly querying all slots from
pg_replication_slots from the primary won't give the correct answer.
In a typical failover case as well, if slots corresponding to a
particular downstream are ready, then that should be sufficient to
continue replication from the standby. Then, also, there is a case
when the primary node is down, then such a query won't work; it can
only work when there is a planned switchover. Considering all these
points, I am not sure if it is a good idea to mention querying the
primary for all slots marked with failover=true. However, I agree that
we should mention something for non-native logical replication
solutions, something on the lines of what Shveta is proposing. OTOH,
if you or Shveta have some clear guidelines for how a downstream can
find the required slots which can work in all or most cases, then it
is okay to mention that as well.

--
With Regards,
Amit Kapila.



Re: Using failover slots for PG-non_PG logical replication

From
shveta malik
Date:
On Thu, Jul 3, 2025 at 9:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > The failover slots documentation [1] is good for PG - PG logical
> > > > replication, but the first two queries require pg_subscription which
> > > > may not be present in non-PG downstream. Somebody looking to setup
> > > > failover slots for non-PG subscriber may not find the page useful.
> > >
> > > Okay.  It appears to me that the entire document at [1] is
> > > specifically intended for a built-in replication setup, and the
> > > corresponding page was written with that context in mind.
> > >
> > > > However, the third query, when modified to mention the replication
> > > > slots relevant to the downstream is useful to them. How to find the
> > > > replication slots to be synchronized is a problem specific to the type
> > > > of downstream. Such a setup should add those slots to
> > > > sync_replication_slots. I think the chapter should mention that the
> > > > 3rd query should also include the slots mentioned in
> > > > sync_replication_slots for PG-non_PG logical replication setup.
> > > >
> > >
> > > sync_replication_slots is a boolean which enables a physical standby
> > > to synchronize logical failover slots. Did you mean something else?
> >
> > I confused this with the actual list of slots to be synchronized.
> > Sorry for that. The slots to be synchronized can be obtained from the
> > primary by querying pg_replication_slots with failover = true.
> >
>
> Note that primary may have slots corresponding to multiple subscriber
> nodes, so querying all slots on primary will give a correct answer may
> depend on the use case. For example, say a user wants to do some sort
> of load balancing such that some of the subscriber/downstream nodes
> are served by a standby, then directly querying all slots from
> pg_replication_slots from the primary won't give the correct answer.
> In a typical failover case as well, if slots corresponding to a
> particular downstream are ready, then that should be sufficient to
> continue replication from the standby. Then, also, there is a case
> when the primary node is down, then such a query won't work; it can
> only work when there is a planned switchover.

Agreed.

> Considering all these
> points, I am not sure if it is a good idea to mention querying the
> primary for all slots marked with failover=true. However, I agree that
> we should mention something for non-native logical replication
> solutions, something on the lines of what Shveta is proposing. OTOH,
> if you or Shveta have some clear guidelines for how a downstream can
> find the required slots which can work in all or most cases, then it
> is okay to mention that as well.
>

I do not have any better ideas here other than the one I proposed earlier.

thanks
Shveta



Re: Using failover slots for PG-non_PG logical replication

From
Amit Kapila
Date:
On Thu, Jul 3, 2025 at 7:07 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jul 3, 2025 at 9:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat
> > > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > The failover slots documentation [1] is good for PG - PG logical
> > > > > replication, but the first two queries require pg_subscription which
> > > > > may not be present in non-PG downstream. Somebody looking to setup
> > > > > failover slots for non-PG subscriber may not find the page useful.
> > > >
> > > > Okay.  It appears to me that the entire document at [1] is
> > > > specifically intended for a built-in replication setup, and the
> > > > corresponding page was written with that context in mind.
> > > >
> > > > > However, the third query, when modified to mention the replication
> > > > > slots relevant to the downstream is useful to them. How to find the
> > > > > replication slots to be synchronized is a problem specific to the type
> > > > > of downstream. Such a setup should add those slots to
> > > > > sync_replication_slots. I think the chapter should mention that the
> > > > > 3rd query should also include the slots mentioned in
> > > > > sync_replication_slots for PG-non_PG logical replication setup.
> > > > >
> > > >
> > > > sync_replication_slots is a boolean which enables a physical standby
> > > > to synchronize logical failover slots. Did you mean something else?
> > >
> > > I confused this with the actual list of slots to be synchronized.
> > > Sorry for that. The slots to be synchronized can be obtained from the
> > > primary by querying pg_replication_slots with failover = true.
> > >
> >
> > Note that primary may have slots corresponding to multiple subscriber
> > nodes, so querying all slots on primary will give a correct answer may
> > depend on the use case. For example, say a user wants to do some sort
> > of load balancing such that some of the subscriber/downstream nodes
> > are served by a standby, then directly querying all slots from
> > pg_replication_slots from the primary won't give the correct answer.
> > In a typical failover case as well, if slots corresponding to a
> > particular downstream are ready, then that should be sufficient to
> > continue replication from the standby. Then, also, there is a case
> > when the primary node is down, then such a query won't work; it can
> > only work when there is a planned switchover.
>
> I think there are two different points of views. Section 29.3 is
> written from a single subscriber's point of view i.e. whether a given
> subscriber can continue logical replication from the new primary after
> failover? The other view is from primary's point of view i.e. if
> primary fails over will all the subscribers be able to continue
> replication? For example, in case of a planned failover, the failover
> orchestrator can check whether all the replication slots have been
> synchronized or not. If so then it goes ahead with the failover. I
> think the section is the right place to guide in this case as well.
>
> > Considering all these
> > points, I am not sure if it is a good idea to mention querying the
> > primary for all slots marked with failover=true. However, I agree that
> > we should mention something for non-native logical replication
> > solutions, something on the lines of what Shveta is proposing. OTOH,
> > if you or Shveta have some clear guidelines for how a downstream can
> > find the required slots which can work in all or most cases, then it
> > is okay to mention that as well.
> >
>
> How about this:
> We change the following sentence in the third paragraph
> To confirm that the standby server is indeed ready for failover <new
> addition> so that a given PostgreSQL subscriber can continue logical
> replication </new addition>, follow ... . <new addition> A
> non-PostgreSQL downstream may need to device a different way to find
> the slots corresponding to its subscriptions or use the next section.
>
> Then add a separate paragraph at the end or a separate section like below.
>
> In order to check whether a standby server is ready for failover so
> that all the subscribers, PostgreSQL as well as non-PostgreSQL, can
> continue logical replication, follow these steps make sure that all
> the replication slots, on the primary server, that have property
> failover = true are synchronized to the standby server.
> 1. On the primary server run following query
> select slot_name from pg_replication_slots where failover and NOT temporary
>
> 2. Check that the logical replication slots identified above exist on
> the standby server and are ready for failover.
> SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS
> failover_ready
>                FROM pg_replication_slots
>                WHERE slot_name IN
>
> Does that look good?
>

Yes, something on these lines sounds like an improvement. Would you
like to propose a patch or want Shveta or me to do the same?

--
With Regards,
Amit Kapila.