Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id ZgOzPiP7NLgBuGL7@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

On Wed, Mar 27, 2024 at 10:08:33AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > -               if (!(RecoveryInProgress() && slot->data.synced))
> > +               if (!(InRecovery && slot->data.synced))
> >                         slot->inactive_since = GetCurrentTimestamp();
> >                 else
> >                         slot->inactive_since = 0;
> >
> > Not related to this change but more the way RestoreSlotFromDisk() behaves here:
> >
> > For a sync slot on standby it will be set to zero and then later will be
> > synchronized with the one coming from the primary. I think that's fine to have
> > it to zero for this window of time.
> 
> Right.
> 
> > Now, if the standby is down and one sets sync_replication_slots to off,
> > then inactive_since will be set to zero on the standby at startup and not
> > synchronized (unless one triggers a manual sync). I also think that's fine but
> > it might be worth to document this behavior (that after a standby startup
> > inactive_since is zero until the next sync...).
> 
> Isn't this behaviour applicable for other slot parameters that the
> slot syncs from the remote slot on the primary?

No they are persisted on disk. If not, we'd not know where to resume the decoding
from on the standby in case primary is down and/or sync is off.

> I've added the following note in the comments when we update
> inactive_since in RestoreSlotFromDisk.
> 
>          * Note that for synced slots after the standby starts up (i.e. after
>          * the slots are loaded from the disk), the inactive_since will remain
>          * zero until the next slot sync cycle.
>          */
>         if (!(InRecovery && slot->data.synced))
>             slot->inactive_since = GetCurrentTimestamp();
>         else
>             slot->inactive_since = 0;

I think we should add some words in the doc too and also about what the meaning
of inactive_since on the standby is (as suggested by Shveta in [1]).

[1]:
https://www.postgresql.org/message-id/CAJpy0uDkTW%2Bt1k3oPkaipFBzZePfFNB5DmiA%3D%3DpxRGcAdpF%3DPg%40mail.gmail.com

> > 7 ===
> >
> > +# Capture and validate inactive_since of a given slot.
> > +sub capture_and_validate_slot_inactive_since
> > +{
> > +       my ($node, $slot_name, $slot_creation_time) = @_;
> > +       my $name = $node->name;
> >
> > We know have capture_and_validate_slot_inactive_since at 2 places:
> > 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
> >
> > Worth to create a sub in Cluster.pm?
> 
> I'd second that thought for now. We might have to debate first if it's
> useful for all the nodes even without replication, and if yes, the
> naming stuff and all that. Historically, we've had such duplicated
> functions until recently, for instance advance_wal and log_contains.
> We
> moved them over to a common perl library Cluster.pm very recently. I'm
> sure we can come back later to move it to Cluster.pm.

I thought that would be the right time not to introduce duplicated code.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: shveta malik
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation