Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALDaNm2UUTfJczjR-rEQwKgmx=iFnuMnR1cXv7ccB+O9P15mYg@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote: > > Attached is the v49 patch set: > - Fixed the bug reported in [1]. > - Addressed comments in [2] and [3]. > > I've split the patch into two, implementing the suggested idea in > comment #5 of [2] separately in 001: > > Patch-001: Adds additional error reports (for all invalidation types) > in ReplicationSlotAcquire() for invalid slots when error_if_invalid = > true. > Patch-002: The original patch with comments addressed. Few comments: 1) I felt this check in wait_for_slot_invalidation is not required as there is a call to trigger_slot_invalidation which sleeps for inactive_timeout seconds and ensures checkpoint is triggered, also the test passes without this: + # Wait for slot to become inactive + $node->poll_query_until( + 'postgres', qq[ + SELECT COUNT(slot_name) = 1 FROM pg_replication_slots + WHERE slot_name = '$slot' AND active = 'f' AND + inactive_since IS NOT NULL; + ]) + or die + "Timed out while waiting for slot $slot to become inactive on node $node_name"; 2) Instead of calling this in a loop, won't it be enough to call checkpoint only once explicitly: + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) + { + $node->safe_psql('postgres', "CHECKPOINT"); + if ($node->log_contains( + "invalidating obsolete replication slot \"$slot\"", $offset)) + { + $invalidated = 1; + last; + } + usleep(100_000); + } + ok($invalidated, + "check that slot $slot invalidation has been logged on node $node_name" + ); 3) Since pg_sync_replication_slots is a sync call, we can directly use "is( $standby1->safe_psql('postgres', SELECT COUNT(slot_name) = 1 FROM pg_replication_slots..." instead of poll_query_until: +$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); +$standby1->poll_query_until( + 'postgres', qq[ + SELECT COUNT(slot_name) = 1 FROM pg_replication_slots + WHERE slot_name = 'sync_slot1' AND + invalidation_reason = 'inactive_timeout'; +]) + or die + "Timed out while waiting for sync_slot1 invalidation to be synced on standby"; 4) Since this variable is being referred to at many places, how about changing it to inactive_timeout_1s so that it is easier while reviewing across many places: # Set timeout GUC on the standby to verify that the next checkpoint will not # invalidate synced slots. my $inactive_timeout = 1; 5) Since we have already tested invalidation of logical replication slot 'sync_slot1' above, this test might not be required: +# ============================================================================= +# Testcase start +# Invalidate logical subscriber slot due to inactive timeout. + +my $publisher = $primary; + +# Prepare for test +$publisher->safe_psql( + 'postgres', qq[ + ALTER SYSTEM SET replication_slot_inactive_timeout TO '0'; +]); +$publisher->reload; Regards, Vignesh
pgsql-hackers by date: