Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From vignesh C
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CALDaNm2UUTfJczjR-rEQwKgmx=iFnuMnR1cXv7ccB+O9P15mYg@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

Few comments:
1) I felt this check in wait_for_slot_invalidation is not required as
there is a call to trigger_slot_invalidation which sleeps for
inactive_timeout seconds and ensures checkpoint is triggered, also the
test passes without this:
+       # Wait for slot to become inactive
+       $node->poll_query_until(
+               'postgres', qq[
+               SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+                       WHERE slot_name = '$slot' AND active = 'f' AND
+                                 inactive_since IS NOT NULL;
+       ])
+         or die
+         "Timed out while waiting for slot $slot to become inactive
on node $node_name";

2) Instead of calling this in a loop, won't it be enough to call
checkpoint only once explicitly:
+       for (my $i = 0; $i < 10 *
$PostgreSQL::Test::Utils::timeout_default; $i++)
+       {
+               $node->safe_psql('postgres', "CHECKPOINT");
+               if ($node->log_contains(
+                               "invalidating obsolete replication
slot \"$slot\"", $offset))
+               {
+                       $invalidated = 1;
+                       last;
+               }
+               usleep(100_000);
+       }
+       ok($invalidated,
+               "check that slot $slot invalidation has been logged on
node $node_name"
+       );

3) Since pg_sync_replication_slots is a sync call, we can directly use
"is( $standby1->safe_psql('postgres', SELECT COUNT(slot_name) = 1 FROM
pg_replication_slots..." instead of poll_query_until:
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$standby1->poll_query_until(
+       'postgres', qq[
+       SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+               WHERE slot_name = 'sync_slot1' AND
+               invalidation_reason = 'inactive_timeout';
+])
+  or die
+  "Timed out while waiting for sync_slot1 invalidation to be synced
on standby";

4) Since this variable is being referred to at many places, how about
changing it to inactive_timeout_1s so that it is easier while
reviewing across many places:
# Set timeout GUC on the standby to verify that the next checkpoint will not
# invalidate synced slots.
my $inactive_timeout = 1;

5) Since we have already tested invalidation of logical replication
slot 'sync_slot1' above, this test might not be required:
+# =============================================================================
+# Testcase start
+# Invalidate logical subscriber slot due to inactive timeout.
+
+my $publisher = $primary;
+
+# Prepare for test
+$publisher->safe_psql(
+       'postgres', qq[
+    ALTER SYSTEM SET replication_slot_inactive_timeout TO '0';
+]);
+$publisher->reload;

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Allow default \watch interval in psql to be configured
Next
From: Peter Eisentraut
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER