Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CAA4eK1K7DdT_5HnOWs5tVPYC=-h+m85wu7k-7RVJaJ7zMxprWQ@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Thu, Aug 29, 2024 at 11:31 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Thanks for looking into this. > > On Mon, Aug 26, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Few comments on 0001: > > 1. > > @@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid > > > > + /* > > + * Skip the sync if the local slot is already invalidated. We do this > > + * beforehand to avoid slot acquire and release. > > + */ > > > > I was wondering why you have added this new check as part of this > > patch. If you see the following comments in the related code, you will > > know why we haven't done this previously. > > Removed. Can deal with optimization separately. > > > 2. > > + */ > > + InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT, > > + 0, InvalidOid, InvalidTransactionId); > > > > Why do we want to call this for shutdown case (when is_shutdown is > > true)? I understand trying to invalidate slots during regular > > checkpoint but not sure if we need it at the time of shutdown. > > Changed it to invalidate only for non-shutdown checkpoints. inactive_timeout invalidation isn't critical for shutdown unlikewal_removed which can help shutdown by freeing up some disk space. > > > The > > other point is can we try to check the performance impact with 100s of > > slots as mentioned in the code comments? > > I first checked how much does the wal_removed invalidation check add to the checkpoint (see 2nd and 3rd column). I thenchecked how much inactive_timeout invalidation check adds to the checkpoint (see 4th column), it is not more than wal_removeinvalidation check. I then checked how much the wal_removed invalidation check adds for replication slots thathave already been invalidated due to inactive_timeout (see 5th column), looks like not much. > > | # of slots | HEAD (no invalidation) ms | HEAD (wal_removed) ms | PATCHED (inactive_timeout) ms | PATCHED (inactive_timeout+wal_removed)ms | > |------------|----------------------------|-----------------------|-------------------------------|------------------------------------------| > | 100 | 18.591 | 370.586 | 359.299 | 373.882 | > | 1000 | 15.722 | 4834.901 | 5081.751 | 5072.128 | > | 10000 | 19.261 | 59801.062 | 61270.406 | 60270.099 | > > Having said that, I'm okay to implement the optimization specified. Thoughts? > The other possibility is to try invalidating due to timeout along with wal_removed case during checkpoint. The idea is that if the slot can be invalidated due to WAL then fine, otherwise check if it can be invalidated due to timeout. This can avoid looping the slots and doing similar work multiple times during the checkpoint. -- With Regards, Amit Kapila.
pgsql-hackers by date: