Re: Using failover slots for PG-non_PG logical replication - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Using failover slots for PG-non_PG logical replication |
Date | |
Msg-id | CAJpy0uBb1RNFFP59Z7hUb=aW6AF+=AQnJep7cNAt9tKu9NC7Jw@mail.gmail.com Whole thread Raw |
In response to | Re: Using failover slots for PG-non_PG logical replication (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Using failover slots for PG-non_PG logical replication
|
List | pgsql-hackers |
On Tue, Jul 8, 2025 at 7:29 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Jul 8, 2025 at 4:03 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Jul 7, 2025 at 7:00 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Mon, Jul 7, 2025 at 9:53 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > Thanks for the patch. > > > > > > > > > I couldn't figure out whether the query on primary to fetch all the > > > > > slots to be synchronized should filter based on invalidation_reason > > > > > and conflicting or not. According to synchronize_slots(), it seems > > > > > that we retain invalidated slots on standby when failover = true and > > > > > they would remain with synced = true on standby. Is that right? > > > > > > > > > > > > > Yes, that’s correct. We sync the invalidation status of replication > > > > slots from the primary to the standby and then stop synchronizing any > > > > slots that have been marked as invalidated, retaining synced flag as > > > > true. IMO, there's no need to filter out conflicting slots on the > > > > primary, because instead of excluding them there and showing > > > > everything as failover-ready on the standby, the correct approach is > > > > to reflect the actual state on standby.This means conflicting slots > > > > will appear as non-failover-ready on the standby. That’s why Step 3 > > > > also considers conflicting flag in its evaluation. > > > > > > Thanks for the explanation. WFM. > > > > > > If a slot is invalidated because RS_INVAL_WAL_REMOVED, conflicting = > > > false, but the slot is not useful standby. But then it's not useful on > > > primary as well. Is that why we are not including (invalidation_reason > > > IS NOT NULL) condition along in (synced AND NOT temporary AND NOT > > > conflicting)? > > > > Thanks for bringing it up. Even if the slot is not useful on the > > primary node as well, we shall still show failover-ready as false on > > standby. We should modify the query of step3 to check > > 'invalidation_reason IS NULL' instead of 'NOT conflicting'. That will > > cover all the cases where the slot is invalidated and thus not > > failover ready. > > Thanks for the confirmation. > > > > > > > > > > > 1) > > > > +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots > > > > + FROM pg_replication_slots r > > > > + WHERE r.failover AND NOT r.temporary; > > > > > > > > On primary, there is no need to check temporary-status. We do not > > > > allow setting failover as true for temporary slots. > > > > > > Why does synchronize_slots() has it in its query? > > > > > > > It is not needed but no harm in maintaining it. > > If we want documents to be in sync with code, we can have 'not > > temporary' check in doc as well. > > > > I think it's better to keep the code and the doc in sync otherwise we > developers would get confused. > > > > > > > > > 2) > > > > Although not directly related to the concerns addressed in the given > > > > patch, I think it would be helpful to add a note in the original doc > > > > stating that Steps 1 and 2 should be executed on each subscriber node > > > > that will be served by the standby after failover. > > > > > > There's a bit of semantic repeatition here since an earlier paragraph > > > mentions a given subscriber. But I think overall it's better to be > > > more clear than being less clear. > > > > > > > > I have attached a top-up patch with the above changes and a few more > > > > trivial changes. Please include it if you find it okay. > > > > > > Thanks. Included. I have made a few edits and included them in the > > > attached patch. > > > > > > > Thanks. The existing changes (originally targeted in this patch) look > > good to me. > > > > I have attached a top-up patch for step-3 correction. Please include > > if you find it okay to be fixed in the same patch, otherwise we can > > handle it separately. > > I have split your top up patch into 2 - one related to the document > change being the subject of this thread and the other for fixing the > query. Committer may squash the patch, if they think so. > The changes look good to me. thanks Shveta
pgsql-hackers by date: