Thread: Re: Using failover slots for PG-non_PG logical replication
On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > > The failover slots documentation [1] is good for PG - PG logical > replication, but the first two queries require pg_subscription which > may not be present in non-PG downstream. Somebody looking to setup > failover slots for non-PG subscriber may not find the page useful. Okay. It appears to me that the entire document at [1] is specifically intended for a built-in replication setup, and the corresponding page was written with that context in mind. > However, the third query, when modified to mention the replication > slots relevant to the downstream is useful to them. How to find the > replication slots to be synchronized is a problem specific to the type > of downstream. Such a setup should add those slots to > sync_replication_slots. I think the chapter should mention that the > 3rd query should also include the slots mentioned in > sync_replication_slots for PG-non_PG logical replication setup. > sync_replication_slots is a boolean which enables a physical standby to synchronize logical failover slots. Did you mean something else? We could include a general note in the documentation something like: 'For non-native replication, it is the responsibility of the plugin author to define the required slots'. Then, we can directly use the third query. Thoughts? [1]: https://www.postgresql.org/docs/current/logical-replication.html thanks Shveta
On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Hi All, > > > > > > The failover slots documentation [1] is good for PG - PG logical > > > replication, but the first two queries require pg_subscription which > > > may not be present in non-PG downstream. Somebody looking to setup > > > failover slots for non-PG subscriber may not find the page useful. > > > > Okay. It appears to me that the entire document at [1] is > > specifically intended for a built-in replication setup, and the > > corresponding page was written with that context in mind. > > > > > However, the third query, when modified to mention the replication > > > slots relevant to the downstream is useful to them. How to find the > > > replication slots to be synchronized is a problem specific to the type > > > of downstream. Such a setup should add those slots to > > > sync_replication_slots. I think the chapter should mention that the > > > 3rd query should also include the slots mentioned in > > > sync_replication_slots for PG-non_PG logical replication setup. > > > > > > > sync_replication_slots is a boolean which enables a physical standby > > to synchronize logical failover slots. Did you mean something else? > > I confused this with the actual list of slots to be synchronized. > Sorry for that. The slots to be synchronized can be obtained from the > primary by querying pg_replication_slots with failover = true. > Note that primary may have slots corresponding to multiple subscriber nodes, so querying all slots on primary will give a correct answer may depend on the use case. For example, say a user wants to do some sort of load balancing such that some of the subscriber/downstream nodes are served by a standby, then directly querying all slots from pg_replication_slots from the primary won't give the correct answer. In a typical failover case as well, if slots corresponding to a particular downstream are ready, then that should be sufficient to continue replication from the standby. Then, also, there is a case when the primary node is down, then such a query won't work; it can only work when there is a planned switchover. Considering all these points, I am not sure if it is a good idea to mention querying the primary for all slots marked with failover=true. However, I agree that we should mention something for non-native logical replication solutions, something on the lines of what Shveta is proposing. OTOH, if you or Shveta have some clear guidelines for how a downstream can find the required slots which can work in all or most cases, then it is okay to mention that as well. -- With Regards, Amit Kapila.
On Thu, Jul 3, 2025 at 9:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > Hi All, > > > > > > > > The failover slots documentation [1] is good for PG - PG logical > > > > replication, but the first two queries require pg_subscription which > > > > may not be present in non-PG downstream. Somebody looking to setup > > > > failover slots for non-PG subscriber may not find the page useful. > > > > > > Okay. It appears to me that the entire document at [1] is > > > specifically intended for a built-in replication setup, and the > > > corresponding page was written with that context in mind. > > > > > > > However, the third query, when modified to mention the replication > > > > slots relevant to the downstream is useful to them. How to find the > > > > replication slots to be synchronized is a problem specific to the type > > > > of downstream. Such a setup should add those slots to > > > > sync_replication_slots. I think the chapter should mention that the > > > > 3rd query should also include the slots mentioned in > > > > sync_replication_slots for PG-non_PG logical replication setup. > > > > > > > > > > sync_replication_slots is a boolean which enables a physical standby > > > to synchronize logical failover slots. Did you mean something else? > > > > I confused this with the actual list of slots to be synchronized. > > Sorry for that. The slots to be synchronized can be obtained from the > > primary by querying pg_replication_slots with failover = true. > > > > Note that primary may have slots corresponding to multiple subscriber > nodes, so querying all slots on primary will give a correct answer may > depend on the use case. For example, say a user wants to do some sort > of load balancing such that some of the subscriber/downstream nodes > are served by a standby, then directly querying all slots from > pg_replication_slots from the primary won't give the correct answer. > In a typical failover case as well, if slots corresponding to a > particular downstream are ready, then that should be sufficient to > continue replication from the standby. Then, also, there is a case > when the primary node is down, then such a query won't work; it can > only work when there is a planned switchover. Agreed. > Considering all these > points, I am not sure if it is a good idea to mention querying the > primary for all slots marked with failover=true. However, I agree that > we should mention something for non-native logical replication > solutions, something on the lines of what Shveta is proposing. OTOH, > if you or Shveta have some clear guidelines for how a downstream can > find the required slots which can work in all or most cases, then it > is okay to mention that as well. > I do not have any better ideas here other than the one I proposed earlier. thanks Shveta
On Thu, Jul 3, 2025 at 7:07 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Jul 3, 2025 at 9:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 2, 2025 at 5:50 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Wed, Jul 2, 2025 at 12:36 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Wed, Jul 2, 2025 at 10:50 AM Ashutosh Bapat > > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > > Hi All, > > > > > > > > > > The failover slots documentation [1] is good for PG - PG logical > > > > > replication, but the first two queries require pg_subscription which > > > > > may not be present in non-PG downstream. Somebody looking to setup > > > > > failover slots for non-PG subscriber may not find the page useful. > > > > > > > > Okay. It appears to me that the entire document at [1] is > > > > specifically intended for a built-in replication setup, and the > > > > corresponding page was written with that context in mind. > > > > > > > > > However, the third query, when modified to mention the replication > > > > > slots relevant to the downstream is useful to them. How to find the > > > > > replication slots to be synchronized is a problem specific to the type > > > > > of downstream. Such a setup should add those slots to > > > > > sync_replication_slots. I think the chapter should mention that the > > > > > 3rd query should also include the slots mentioned in > > > > > sync_replication_slots for PG-non_PG logical replication setup. > > > > > > > > > > > > > sync_replication_slots is a boolean which enables a physical standby > > > > to synchronize logical failover slots. Did you mean something else? > > > > > > I confused this with the actual list of slots to be synchronized. > > > Sorry for that. The slots to be synchronized can be obtained from the > > > primary by querying pg_replication_slots with failover = true. > > > > > > > Note that primary may have slots corresponding to multiple subscriber > > nodes, so querying all slots on primary will give a correct answer may > > depend on the use case. For example, say a user wants to do some sort > > of load balancing such that some of the subscriber/downstream nodes > > are served by a standby, then directly querying all slots from > > pg_replication_slots from the primary won't give the correct answer. > > In a typical failover case as well, if slots corresponding to a > > particular downstream are ready, then that should be sufficient to > > continue replication from the standby. Then, also, there is a case > > when the primary node is down, then such a query won't work; it can > > only work when there is a planned switchover. > > I think there are two different points of views. Section 29.3 is > written from a single subscriber's point of view i.e. whether a given > subscriber can continue logical replication from the new primary after > failover? The other view is from primary's point of view i.e. if > primary fails over will all the subscribers be able to continue > replication? For example, in case of a planned failover, the failover > orchestrator can check whether all the replication slots have been > synchronized or not. If so then it goes ahead with the failover. I > think the section is the right place to guide in this case as well. > > > Considering all these > > points, I am not sure if it is a good idea to mention querying the > > primary for all slots marked with failover=true. However, I agree that > > we should mention something for non-native logical replication > > solutions, something on the lines of what Shveta is proposing. OTOH, > > if you or Shveta have some clear guidelines for how a downstream can > > find the required slots which can work in all or most cases, then it > > is okay to mention that as well. > > > > How about this: > We change the following sentence in the third paragraph > To confirm that the standby server is indeed ready for failover <new > addition> so that a given PostgreSQL subscriber can continue logical > replication </new addition>, follow ... . <new addition> A > non-PostgreSQL downstream may need to device a different way to find > the slots corresponding to its subscriptions or use the next section. > > Then add a separate paragraph at the end or a separate section like below. > > In order to check whether a standby server is ready for failover so > that all the subscribers, PostgreSQL as well as non-PostgreSQL, can > continue logical replication, follow these steps make sure that all > the replication slots, on the primary server, that have property > failover = true are synchronized to the standby server. > 1. On the primary server run following query > select slot_name from pg_replication_slots where failover and NOT temporary > > 2. Check that the logical replication slots identified above exist on > the standby server and are ready for failover. > SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS > failover_ready > FROM pg_replication_slots > WHERE slot_name IN > > Does that look good? > Yes, something on these lines sounds like an improvement. Would you like to propose a patch or want Shveta or me to do the same? -- With Regards, Amit Kapila.