Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALDaNm2VQW_gpOJ-QWkEA_h18DN31ELEz2_7QmwWCAg9=Zew4A@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
On Wed, 13 Nov 2024 at 15:00, Nisha Moond <nisha.moond412@gmail.com> wrote: > > Please find the v48 patch attached. > > On Thu, Sep 19, 2024 at 9:40 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > When we promote hot standby with synced logical slots to become new > > primary, the logical slots are never invalidated with > > 'inactive_timeout' on new primary. It seems the check in > > SlotInactiveTimeoutCheckAllowed() is wrong. We should allow > > invalidation of slots on primary even if they are marked as 'synced'. > > fixed. > > > I have raised 4 issues so far on v46, the first 3 are in [1],[2],[3]. > > Once all these are addressed, I can continue reviewing further. > > > > Fixed issues reported in [1], [2]. Few comments: 1) Since we don't change the value of now in ReplicationSlotSetInactiveSince, the function parameter can be passed by value: +/* + * Set slot's inactive_since property unless it was previously invalidated. + */ +static inline void +ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now, + bool acquire_lock) +{ + if (s->data.invalidated != RS_INVAL_NONE) + return; + + if (acquire_lock) + SpinLockAcquire(&s->mutex); + + s->inactive_since = *now; 2) Currently it allows a minimum value of less than 1 second like in milliseconds, I feel we can have some minimum value at least something like checkpoint_timeout: diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 8a67f01200..367f510118 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"replication_slot_inactive_timeout", PGC_SIGHUP, REPLICATION_SENDING, + gettext_noop("Sets the amount of time a replication slot can remain inactive before " + "it will be invalidated."), + NULL, + GUC_UNIT_S + }, + &replication_slot_inactive_timeout, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, 3) Since SlotInactiveTimeoutCheckAllowed check is just done above and the current time has been retrieved can we used "now" variable instead of SlotInactiveTimeoutCheckAllowed again second time: @@ -1651,6 +1713,26 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, if (SlotIsLogical(s)) invalidation_cause = cause; break; + case RS_INVAL_INACTIVE_TIMEOUT: + + /* + * Check if the slot needs to be invalidated due to + * replication_slot_inactive_timeout GUC. + */ + if (SlotInactiveTimeoutCheckAllowed(s) && + TimestampDifferenceExceeds(s->inactive_since, now, + replication_slot_inactive_timeout * 1000)) + { + invalidation_cause = cause; + inactive_since = s->inactive_since; 4) I'm not sure if this change required by this patch or is it a general optimization, if it is required for this patch we can detail the comments: @@ -2208,6 +2328,7 @@ RestoreSlotFromDisk(const char *name) bool restored = false; int readBytes; pg_crc32c checksum; + TimestampTz now; /* no need to lock here, no concurrent access allowed yet */ @@ -2368,6 +2489,9 @@ RestoreSlotFromDisk(const char *name) NameStr(cp.slotdata.name)), errhint("Change \"wal_level\" to be \"replica\" or higher."))); + /* Use same inactive_since time for all slots */ + now = GetCurrentTimestamp(); + /* nothing can be active yet, don't lock anything */ for (i = 0; i < max_replication_slots; i++) { @@ -2400,7 +2524,7 @@ RestoreSlotFromDisk(const char *name) * slot from the disk into memory. Whoever acquires the slot i.e. * makes the slot active will reset it. */ - slot->inactive_since = GetCurrentTimestamp(); + slot->inactive_since = now; 5) Why should the slot invalidation be updated during shutdown, shouldn't the inactive_since value be intact during shutdown? - <literal>NULL</literal> if the slot is currently being used. - Note that for slots on the standby that are being synced from a + <literal>NULL</literal> if the slot is currently being used. Once the + slot is invalidated, this value will remain unchanged until we shutdown + the server. Note that for slots on the standby that are being synced from a 6) New Style of ereport does not need braces around errcode, it can be changed similarly: + if (error_if_invalid && + s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) + { + Assert(s->inactive_since > 0); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("can no longer get changes from replication slot \"%s\"", + NameStr(s->data.name)), + errdetail("This slot has been invalidated because it was inactive for longer than the amount of time specified by \"%s\".", + "replication_slot_inactive_timeout"))); Regards, Vignesh
pgsql-hackers by date: