Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAA4eK1Jb9=LPzoacou_wh07ZouVpeOA7eFAtGmgVh6yOxrvY1g@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Mon, Aug 4, 2025 at 12:19 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > 5) > > > I tried a test where there were 4 slots on the publisher, where one > > > was getting used while the others were not. Initiated > > > pg_sync_replication_slots on standby. Forced unused slots to be > > > invalidated by setting idle_replication_slot_timeout=60 on primary, > > > due to which API finished but gave a warning: > > > > > > postgres=# SELECT pg_sync_replication_slots(); > > > WARNING: aborting initial sync for slot "failover_slot" > > > DETAIL: This slot was invalidated on the primary server. > > > WARNING: aborting initial sync for slot "failover_slot2" > > > DETAIL: This slot was invalidated on the primary server. > > > WARNING: aborting initial sync for slot "failover_slot3" > > > DETAIL: This slot was invalidated on the primary server. > > > pg_sync_replication_slots > > > --------------------------- > > > > > > (1 row) > > > > > > Do we need these warnings here? I think we can have it as a LOG rather > > > than having it on console. Thoughts? > > > > > > > What is the behaviour of a slotsync worker in the same case? I don't > > see any such WARNING messages in the code of slotsync worker, so why > > do we want a different behaviour here? > > > > We don’t have continuous waiting in the slot-sync worker if the remote > slot is behind the local slot. But if during the first sync cycle the > remote slot is behind, we keep the local slot as a temporary slot. In > the next sync cycle, if we find the remote slot is invalidated, we > mark the local slot as invalidated too, keeping it in this temporary > state. There are no LOG or WARNING messages in this case. When the > slot-sync worker stops or shuts down (like during promotion), it > cleans up this temporary slot. > > Now, for the API behavior: if the remote slot is behind the local > slot, the API enters a wait loop and logs: > > LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and > catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin > (770) > > If it keeps waiting, every 10 seconds it logs: > LOG: continuing to wait for remote slot "failover_slot" LSN > (0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060) > and catalog xmin (770) > > If the remote slot becomes invalidated during this wait, currently it > logs a WARNING and moves to syncing the next slot: > WARNING: aborting initial sync for slot "failover_slot" as the slot > was invalidated on primary > > I think this WARNING is too strong. We could change it to a LOG > message instead, mark the local slot as invalidated, exit the wait > loop, and move on to syncing the next slot. > > Even though this LOG is not there in slotsync worker case, I think it > makes more sense in API case due to continuous LOGs which suggested > that API was waiting to sync this slot in wait-loop and thus we shall > conclude it either by saying wait-over (like we do in successful sync > case) or we can say 'LOG: aborting wait as the remote slot was > invalidated' instead of above WARNING message. What do you suggest? > I also think LOG is a better choice for this because there is nothing we can expect users to do even after seeing this. I feel this is more of an info for users. -- With Regards, Amit Kapila.
pgsql-hackers by date: