Re: Race conditions in 019_replslot_limit.pl - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race conditions in 019_replslot_limit.pl
Date
Msg-id 20220216.150119.226485024172638507.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race conditions in 019_replslot_limit.pl  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Wed, 16 Feb 2022 14:26:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Agreed.  Doing this att all slot creation seems fine.

Done in the attached. The first slot is deliberately created
unreserved so I changed the code to re-create the slot with "reserved"
before taking backup.

> > Even though the node has log_disconnect = true, and other processes indeed log
> > their disconnection, there's no disconnect for the above session until the
> > server is shut down.  Even though pg_basebackup clearly finished? Uh, huh?
> 
> It seems to me so, too.
> 
> > I guess it's conceivable that the backend was still working through process
> > shutdown? But it doesn't seem too likely, given that several other connections
> > manage to get through entire connect / disconnect cycles?
> 
> Yes, but since postmaster seems thinking that process is gone.

s/ since//;

Whatever is happening at that time, I can make sure that walsender is
gone before making a new replication connection, even though it
doesn't "fix" any of the observed issues.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 4257bd4d35..059003d63a 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -35,6 +35,11 @@ my $result = $node_primary->safe_psql('postgres',
 );
 is($result, "t|t|t", 'check the state of non-reserved slot is "unknown"');
 
+# re-create reserved replication slot before taking backup
+$node_primary->safe_psql('postgres', q[
+    SELECT pg_drop_replication_slot('rep1');
+    SELECT pg_create_physical_replication_slot('rep1', true);
+]);
 
 # Take backup
 my $backup_name = 'my_backup';
@@ -265,7 +270,7 @@ log_checkpoints = yes
 ));
 $node_primary2->start;
 $node_primary2->safe_psql('postgres',
-    "SELECT pg_create_physical_replication_slot('rep1')");
+    "SELECT pg_create_physical_replication_slot('rep1', true)");
 $backup_name = 'my_backup2';
 $node_primary2->backup($backup_name);
 
@@ -319,7 +324,7 @@ $node_primary3->append_conf(
     ));
 $node_primary3->start;
 $node_primary3->safe_psql('postgres',
-    "SELECT pg_create_physical_replication_slot('rep3')");
+    "SELECT pg_create_physical_replication_slot('rep3', true)");
 # Take backup
 $backup_name = 'my_backup';
 $node_primary3->backup($backup_name);
@@ -327,6 +332,14 @@ $node_primary3->backup($backup_name);
 my $node_standby3 = PostgreSQL::Test::Cluster->new('standby_3');
 $node_standby3->init_from_backup($node_primary3, $backup_name,
     has_streaming => 1);
+
+# We will check for walsender process just after. Make sure no
+# walsenders will stay sitting.
+$node_primary3->poll_query_until('postgres',
+    "SELECT count(*) = 0 FROM pg_stat_activity WHERE backend_type = 'walsender'",
+    "t")
+  or die "timed out waiting for wealsender to get out";
+
 $node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'");
 $node_standby3->start;
 $node_primary3->wait_for_catchup($node_standby3);

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl