Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CAHut+Pvi-g+9+hjmjg44OzTN9L3YGQiCXBDAVaTVWvSn5SSwmw@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi Nisha,

Here are some review comments for the patch v50-0002.

======
src/backend/replication/slot.c

InvalidatePossiblyObsoleteSlot:

1.
+ if (now &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout_sec * 1000))

Previously this was using an additional call to SlotInactiveTimeoutCheckAllowed:

+ if (SlotInactiveTimeoutCheckAllowed(s) &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout * 1000))

Is it OK to skip that call? e.g. can the slot fields possibly change
between assigning the 'now' and acquiring the mutex? If not, then the
current code is fine. The only reason for asking is because it is
slightly suspicious that it was not done this "easy" way in the first
place.

~~~

check_replication_slot_inactive_timeout:

2.
+/*
+ * GUC check_hook for replication_slot_inactive_timeout
+ *
+ * We don't allow the value of replication_slot_inactive_timeout other than 0
+ * during the binary upgrade.
+ */

The "We don't allow..." sentence seems like a backward way of saying:
The value of replication_slot_inactive_timeout must be set to 0 during
the binary upgrade.

======
src/test/recovery/t/050_invalidate_slots.pl

3.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby.

What does "on its own" mean here? Do you mean it won't get invalidated
unless the invalidation state is propagated from the primary? Maybe
the comment can be clearer.

~

4.
+# Wait for slot to first become inactive and then get invalidated
+sub wait_for_slot_invalidation
+{
+ my ($node, $slot, $offset, $inactive_timeout_1s) = @_;
+ my $node_name = $node->name;
+

It was OK to change the variable name to 'inactive_timeout_1s' outside
of here, but within the subroutine, I don't think it is appropriate
because this is a parameter that potentially could have any value.

~

5.
+# Trigger slot invalidation and confirm it in the server log
+sub trigger_slot_invalidation
+{
+ my ($node, $slot, $offset, $inactive_timeout_1s) = @_;
+ my $node_name = $node->name;
+ my $invalidated = 0;

It was OK to change the variable name to 'inactive_timeout_1s' outside
of here, but within the subroutine, I don't think it is appropriate
because this is a parameter that potentially could have any value.

~

6.
+ # Give enough time to avoid multiple checkpoints
+ sleep($inactive_timeout_1s + 1);
+
+ # Run a checkpoint
+ $node->safe_psql('postgres', "CHECKPOINT");

Since you are not doing multiple checkpoints anymore, it looks like
that "Give enough time..." comment needs updating.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
Next
From: Andrey Borodin
Date:
Subject: Re: UUID v7