Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | ZgMLTBD3xJIP9W93@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote: > On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > If we just sync inactive_since value for synced slots while in > > recovery from the primary, so be it. Why do we need to update it to > > the current time when the slot is being created? We don't expose slot > > creation time, no? Aren't we fine if we just sync the value from > > primary and document that fact? After the promotion, we can reset it > > to the current time so that it gets its own time. > > I'm attaching v24 patches. It implements the above idea proposed > upthread for synced slots. I've now separated > s/last_inactive_time/inactive_since and synced slots behaviour. Please > have a look. Thanks! ==== v24-0001 It's now pure mechanical changes and it looks good to me. ==== v24-0002 1 === This commit does two things: 1) Updates inactive_since for sync slots with the value received from the primary's slot. Tested it and it does that. 2 === 2) Ensures the value is set to current timestamp during the shutdown of slot sync machinery to help correctly interpret the time if the standby gets promoted without a restart. Tested it and it does that. 3 === +/* + * Reset the synced slots info such as inactive_since after shutting + * down the slot sync machinery. + */ +static void +update_synced_slots_inactive_time(void) Looks like the comment "reset" is not matching the name of the function and what it does. 4 === + /* + * We get the current time beforehand and only once to avoid + * system calls overhead while holding the lock. + */ + if (now == 0) + now = GetCurrentTimestamp(); Also +1 of having GetCurrentTimestamp() just called one time within the loop. 5 === - if (!(RecoveryInProgress() && slot->data.synced)) + if (!(InRecovery && slot->data.synced)) slot->inactive_since = GetCurrentTimestamp(); else slot->inactive_since = 0; Not related to this change but more the way RestoreSlotFromDisk() behaves here: For a sync slot on standby it will be set to zero and then later will be synchronized with the one coming from the primary. I think that's fine to have it to zero for this window of time. Now, if the standby is down and one sets sync_replication_slots to off, then inactive_since will be set to zero on the standby at startup and not synchronized (unless one triggers a manual sync). I also think that's fine but it might be worth to document this behavior (that after a standby startup inactive_since is zero until the next sync...). 6 === + print "HI $slot_name $name $inactive_since $slot_creation_time\n"; garbage? 7 === +# Capture and validate inactive_since of a given slot. +sub capture_and_validate_slot_inactive_since +{ + my ($node, $slot_name, $slot_creation_time) = @_; + my $name = $node->name; We know have capture_and_validate_slot_inactive_since at 2 places: 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl. Worth to create a sub in Cluster.pm? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: