Re: Using failover slots for PG-non_PG logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Using failover slots for PG-non_PG logical replication
Date
Msg-id CAA4eK1+1+Ah2Gr7a1QoK1f+cDqvhd3ZbNeio1Z2qaC5YjumeQQ@mail.gmail.com
Whole thread Raw
In response to Re: Using failover slots for PG-non_PG logical replication  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Thu, Jul 3, 2025 at 7:07 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jul 3, 2025 at 9:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat
> > > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > The failover slots documentation [1] is good for PG - PG logical
> > > > > replication, but the first two queries require pg_subscription which
> > > > > may not be present in non-PG downstream. Somebody looking to setup
> > > > > failover slots for non-PG subscriber may not find the page useful.
> > > >
> > > > Okay.  It appears to me that the entire document at [1] is
> > > > specifically intended for a built-in replication setup, and the
> > > > corresponding page was written with that context in mind.
> > > >
> > > > > However, the third query, when modified to mention the replication
> > > > > slots relevant to the downstream is useful to them. How to find the
> > > > > replication slots to be synchronized is a problem specific to the type
> > > > > of downstream. Such a setup should add those slots to
> > > > > sync_replication_slots. I think the chapter should mention that the
> > > > > 3rd query should also include the slots mentioned in
> > > > > sync_replication_slots for PG-non_PG logical replication setup.
> > > > >
> > > >
> > > > sync_replication_slots is a boolean which enables a physical standby
> > > > to synchronize logical failover slots. Did you mean something else?
> > >
> > > I confused this with the actual list of slots to be synchronized.
> > > Sorry for that. The slots to be synchronized can be obtained from the
> > > primary by querying pg_replication_slots with failover = true.
> > >
> >
> > Note that primary may have slots corresponding to multiple subscriber
> > nodes, so querying all slots on primary will give a correct answer may
> > depend on the use case. For example, say a user wants to do some sort
> > of load balancing such that some of the subscriber/downstream nodes
> > are served by a standby, then directly querying all slots from
> > pg_replication_slots from the primary won't give the correct answer.
> > In a typical failover case as well, if slots corresponding to a
> > particular downstream are ready, then that should be sufficient to
> > continue replication from the standby. Then, also, there is a case
> > when the primary node is down, then such a query won't work; it can
> > only work when there is a planned switchover.
>
> I think there are two different points of views. Section 29.3 is
> written from a single subscriber's point of view i.e. whether a given
> subscriber can continue logical replication from the new primary after
> failover? The other view is from primary's point of view i.e. if
> primary fails over will all the subscribers be able to continue
> replication? For example, in case of a planned failover, the failover
> orchestrator can check whether all the replication slots have been
> synchronized or not. If so then it goes ahead with the failover. I
> think the section is the right place to guide in this case as well.
>
> > Considering all these
> > points, I am not sure if it is a good idea to mention querying the
> > primary for all slots marked with failover=true. However, I agree that
> > we should mention something for non-native logical replication
> > solutions, something on the lines of what Shveta is proposing. OTOH,
> > if you or Shveta have some clear guidelines for how a downstream can
> > find the required slots which can work in all or most cases, then it
> > is okay to mention that as well.
> >
>
> How about this:
> We change the following sentence in the third paragraph
> To confirm that the standby server is indeed ready for failover <new
> addition> so that a given PostgreSQL subscriber can continue logical
> replication </new addition>, follow ... . <new addition> A
> non-PostgreSQL downstream may need to device a different way to find
> the slots corresponding to its subscriptions or use the next section.
>
> Then add a separate paragraph at the end or a separate section like below.
>
> In order to check whether a standby server is ready for failover so
> that all the subscribers, PostgreSQL as well as non-PostgreSQL, can
> continue logical replication, follow these steps make sure that all
> the replication slots, on the primary server, that have property
> failover = true are synchronized to the standby server.
> 1. On the primary server run following query
> select slot_name from pg_replication_slots where failover and NOT temporary
>
> 2. Check that the logical replication slots identified above exist on
> the standby server and are ready for failover.
> SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS
> failover_ready
>                FROM pg_replication_slots
>                WHERE slot_name IN
>
> Does that look good?
>

Yes, something on these lines sounds like an improvement. Would you
like to propose a patch or want Shveta or me to do the same?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION
Next
From: Michael Paquier
Date:
Subject: Fix deprecation warning with libxml2 2.14 in contrib/xml2/