Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAJpy0uAFSyORpSs99aTBHJ+kEy+4hsjfQAJYHmGy6i+sCB7Now@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Improve pg_sync_replication_slots() to wait for primary to advance
|
List | pgsql-hackers |
On Mon, Aug 4, 2025 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 1, 2025 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > 5) > > I tried a test where there were 4 slots on the publisher, where one > > was getting used while the others were not. Initiated > > pg_sync_replication_slots on standby. Forced unused slots to be > > invalidated by setting idle_replication_slot_timeout=60 on primary, > > due to which API finished but gave a warning: > > > > postgres=# SELECT pg_sync_replication_slots(); > > WARNING: aborting initial sync for slot "failover_slot" > > DETAIL: This slot was invalidated on the primary server. > > WARNING: aborting initial sync for slot "failover_slot2" > > DETAIL: This slot was invalidated on the primary server. > > WARNING: aborting initial sync for slot "failover_slot3" > > DETAIL: This slot was invalidated on the primary server. > > pg_sync_replication_slots > > --------------------------- > > > > (1 row) > > > > Do we need these warnings here? I think we can have it as a LOG rather > > than having it on console. Thoughts? > > > > What is the behaviour of a slotsync worker in the same case? I don't > see any such WARNING messages in the code of slotsync worker, so why > do we want a different behaviour here? > We don’t have continuous waiting in the slot-sync worker if the remote slot is behind the local slot. But if during the first sync cycle the remote slot is behind, we keep the local slot as a temporary slot. In the next sync cycle, if we find the remote slot is invalidated, we mark the local slot as invalidated too, keeping it in this temporary state. There are no LOG or WARNING messages in this case. When the slot-sync worker stops or shuts down (like during promotion), it cleans up this temporary slot. Now, for the API behavior: if the remote slot is behind the local slot, the API enters a wait loop and logs: LOG: waiting for remote slot "failover_slot" LSN (0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin (770) If it keeps waiting, every 10 seconds it logs: LOG: continuing to wait for remote slot "failover_slot" LSN (0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin (770) If the remote slot becomes invalidated during this wait, currently it logs a WARNING and moves to syncing the next slot: WARNING: aborting initial sync for slot "failover_slot" as the slot was invalidated on primary I think this WARNING is too strong. We could change it to a LOG message instead, mark the local slot as invalidated, exit the wait loop, and move on to syncing the next slot. Even though this LOG is not there in slotsync worker case, I think it makes more sense in API case due to continuous LOGs which suggested that API was waiting to sync this slot in wait-loop and thus we shall conclude it either by saying wait-over (like we do in successful sync case) or we can say 'LOG: aborting wait as the remote slot was invalidated' instead of above WARNING message. What do you suggest? thanks Shveta
pgsql-hackers by date: