Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id ZgMLTBD3xJIP9W93@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
Hi,

On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created? We don't expose slot
> > creation time, no? Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time.
> 
> I'm attaching v24 patches. It implements the above idea proposed
> upthread for synced slots. I've now separated
> s/last_inactive_time/inactive_since and synced slots behaviour. Please
> have a look.

Thanks!

==== v24-0001

It's now pure mechanical changes and it looks good to me.

==== v24-0002

1 ===

    This commit does two things:
    1) Updates inactive_since for sync slots with the value
    received from the primary's slot.

Tested it and it does that.

2 ===

    2) Ensures the value is set to current timestamp during the
    shutdown of slot sync machinery to help correctly interpret the
    time if the standby gets promoted without a restart.

Tested it and it does that.

3 ===

+/*
+ * Reset the synced slots info such as inactive_since after shutting
+ * down the slot sync machinery.
+ */
+static void
+update_synced_slots_inactive_time(void)

Looks like the comment "reset" is not matching the name of the function and
what it does.

4 ===

+                       /*
+                        * We get the current time beforehand and only once to avoid
+                        * system calls overhead while holding the lock.
+                        */
+                       if (now == 0)
+                               now = GetCurrentTimestamp();

Also +1 of having GetCurrentTimestamp() just called one time within the loop.

5 ===

-               if (!(RecoveryInProgress() && slot->data.synced))
+               if (!(InRecovery && slot->data.synced))
                        slot->inactive_since = GetCurrentTimestamp();
                else
                        slot->inactive_since = 0;

Not related to this change but more the way RestoreSlotFromDisk() behaves here:

For a sync slot on standby it will be set to zero and then later will be
synchronized with the one coming from the primary. I think that's fine to have
it to zero for this window of time.

Now, if the standby is down and one sets sync_replication_slots to off,
then inactive_since will be set to zero on the standby at startup and not 
synchronized (unless one triggers a manual sync). I also think that's fine but
it might be worth to document this behavior (that after a standby startup
inactive_since is zero until the next sync...). 

6 ===

+       print "HI  $slot_name $name $inactive_since $slot_creation_time\n";

garbage?

7 ===

+# Capture and validate inactive_since of a given slot.
+sub capture_and_validate_slot_inactive_since
+{
+       my ($node, $slot_name, $slot_creation_time) = @_;
+       my $name = $node->name;

We know have capture_and_validate_slot_inactive_since at 2 places:
040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.

Worth to create a sub in Cluster.pm?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: UUID v7
Next
From: Peter Geoghegan
Date:
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan