Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALj2ACXbUzp-9tixeUK_dws5k+ARJaXxj8cj9W3adA0XNUS4Hg@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > shouldn't the slot be dropped/recreated instead of updating inactive_since? > > > > The sync slots that are invalidated on the primary aren't dropped and > > recreated on the standby. > > Yeah, right (I was confused with synced slots that are invalidated locally). > > > However, I > > found that the synced slot is acquired and released unnecessarily > > after the invalidation_reason is synced from the primary. I added a > > skip check in synchronize_one_slot to skip acquiring and releasing the > > slot if it's locally found inactive. With this, inactive_since won't > > get updated for invalidated sync slots on the standby as we don't > > acquire and release the slot. > > CR1 === > > Yeah, I can see: > > @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > " name slot \"%s\" already exists on the standby", > remote_slot->name)); > > + /* > + * Skip the sync if the local slot is already invalidated. We do this > + * beforehand to save on slot acquire and release. > + */ > + if (slot->data.invalidated != RS_INVAL_NONE) > + return false; > > Thanks to the drop_local_obsolete_slots() call I think we are not missing the case > where the slot has been invalidated on the primary, invalidation reason has been > synced on the standby and later the slot is dropped/ recreated manually on the > primary (then it should be dropped/recreated on the standby too). > > Also it seems we are not missing the case where a sync slot is invalidated > locally due to wal removal (it should be dropped/recreated). Right. > > > CR5 === > > > > > > + /* > > > + * This function isn't expected to be called for inactive timeout based > > > + * invalidation. A separate function InvalidateInactiveReplicationSlot is > > > + * to be used for that. > > > > > > Do you think it's worth to explain why? > > > > Hm, I just wanted to point out the actual function here. I modified it > > to something like the following, if others feel we don't need that, I > > can remove it. > > Sorry If I was not clear but I meant to say "Do you think it's worth to explain > why we decided to create a dedicated function"? (currently we "just" explain that > we created one). We added a new function (InvalidateInactiveReplicationSlot) to invalidate slot based on inactive timeout because 1) we do the inactive timeout invalidation at slot level as opposed to InvalidateObsoleteReplicationSlots which does loop over all the slots, 2) InvalidatePossiblyObsoleteSlot does release the lock in some cases, has a lot of unneeded code for inactive timeout invalidation check, 3) we want some control over saving the slot to disk because we hook the inactive timeout invalidation into the loop that checkpoints the slot info to the disk in CheckPointReplicationSlots. I've added a comment atop InvalidateInactiveReplicationSlot. Please find the attached v36 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: