Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CAA4eK1K7DdT_5HnOWs5tVPYC=-h+m85wu7k-7RVJaJ7zMxprWQ@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Aug 29, 2024 at 11:31 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for looking into this.
>
> On Mon, Aug 26, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Few comments on 0001:
> > 1.
> > @@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> >
> > + /*
> > + * Skip the sync if the local slot is already invalidated. We do this
> > + * beforehand to avoid slot acquire and release.
> > + */
> >
> > I was wondering why you have added this new check as part of this
> > patch. If you see the following comments in the related code, you will
> > know why we haven't done this previously.
>
> Removed. Can deal with optimization separately.
>
> > 2.
> > + */
> > + InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
> > +    0, InvalidOid, InvalidTransactionId);
> >
> > Why do we want to call this for shutdown case (when is_shutdown is
> > true)? I understand trying to invalidate slots during regular
> > checkpoint but not sure if we need it at the time of shutdown.
>
> Changed it to invalidate only for non-shutdown checkpoints. inactive_timeout invalidation isn't critical for shutdown
unlikewal_removed which can help shutdown by freeing up some disk space. 
>
> > The
> > other point is can we try to check the performance impact with 100s of
> > slots as mentioned in the code comments?
>
> I first checked how much does the wal_removed invalidation check add to the checkpoint (see 2nd and 3rd column). I
thenchecked how much inactive_timeout invalidation check adds to the checkpoint (see 4th column), it is not more than
wal_removeinvalidation check. I then checked how much the wal_removed invalidation check adds for replication slots
thathave already been invalidated due to inactive_timeout (see 5th column), looks like not much. 
>
> | # of slots | HEAD (no invalidation) ms | HEAD (wal_removed) ms | PATCHED (inactive_timeout) ms | PATCHED
(inactive_timeout+wal_removed)ms | 
>
|------------|----------------------------|-----------------------|-------------------------------|------------------------------------------|
> | 100        | 18.591                     | 370.586               | 359.299                       | 373.882
                      | 
> | 1000       | 15.722                     | 4834.901              | 5081.751                      | 5072.128
                      | 
> | 10000      | 19.261                     | 59801.062             | 61270.406                     | 60270.099
                      | 
>
> Having said that, I'm okay to implement the optimization specified. Thoughts?
>

The other possibility is to try invalidating due to timeout along with
wal_removed case during checkpoint. The idea is that if the slot can
be invalidated due to WAL then fine, otherwise check if it can be
invalidated due to timeout. This can avoid looping the slots and doing
similar work multiple times during the checkpoint.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Invalid Assert while validating REPLICA IDENTITY?
Next
From: Peter Eisentraut
Date:
Subject: Re: thread-safety: getpwuid_r()