Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | Zg+rvmqjL3DfbJdx@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > Please find the attached v36 patch. Thanks! A few comments: 1 === + <para> + The timeout is measured from the time since the slot has become + inactive (known from its + <structfield>inactive_since</structfield> value) until it gets + used (i.e., its <structfield>active</structfield> is set to true). + </para> That's right except when it's invalidated during the checkpoint (as the slot is not acquired in CheckPointReplicationSlots()). So, what about adding: "or a checkpoint occurs"? That would also explain that the invalidation could occur during checkpoint. 2 === + /* If the slot has been invalidated, recalculate the resource limits */ + if (invalidated) + { /If the slot/If a slot/? 3 === + * NB - this function also runs as part of checkpoint, so avoid raising errors s/NB - this/NB: This function/? (that looks more consistent with other comments in the code) 4 === + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead" looks weird to me. Maybe it would make sense to reword this a bit. 5 === + * considered not active as they don't actually perform logical decoding. Not sure that's 100% accurate as we switched in fast forward logical in 2ec005b4e2. "as they perform only fast forward logical decoding (or not at all)", maybe? 6 === + if (RecoveryInProgress() && slot->data.synced) + return false; + + if (replication_slot_inactive_timeout == 0) + return false; What about just using one if? It's more a matter of taste but it also probably reduces the object file size a bit for non optimized build. 7 === + /* + * Do not invalidate the slots which are currently being synced from + * the primary to the standby. + */ + if (RecoveryInProgress() && slot->data.synced) + return false; I think we don't need this check as the exact same one is done just before. 8 === +sub check_for_slot_invalidation_in_server_log +{ + my ($node, $slot_name, $offset) = @_; + my $invalidated = 0; + + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) + { + $node->safe_psql('postgres', "CHECKPOINT"); Wouldn't be better to wait for the replication_slot_inactive_timeout time before instead of triggering all those checkpoints? (it could be passed as an extra arg to wait_for_slot_invalidation()). 9 === # Synced slot mustn't get invalidated on the standby, it must sync invalidation # from the primary. So, we must not see the slot's invalidation message in server # log. ok( !$standby1->log_contains( "invalidating obsolete replication slot \"lsub1_sync_slot\"", $standby1_logstart), 'check that syned slot has not been invalidated on the standby'); Would that make sense to trigger a checkpoint on the standby before this test? I mean I think that without a checkpoint on the standby we should not see the invalidation in the log anyway. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: