Home > mailing lists

Re: Recent 027_streaming_regress.pl hangs - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Recent 027_streaming_regress.pl hangs
Date	August 12, 2024 00:32:05
Msg-id	352068.1723422725@sss.pgh.pa.us Whole thread Raw
In response to	Re: Recent 027_streaming_regress.pl hangs (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: Recent 027_streaming_regress.pl hangs
List	pgsql-hackers

Tree view

Andrew Dunstan <andrew@dunslane.net> writes:
> We'll see. I have switched crake from --run-parallel mode to --run-all 
> mode i.e. the runs are serialized. Maybe that will be enough to stop the 
> errors. I'm still annoyed that this test is susceptible to load, if that 
> is indeed what is the issue.

crake is still timing out intermittently on 027_streaming_regress.pl,
so that wasn't it.  I think we need more data.  We know that the
wait_for_catchup query is never getting to true:

    SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'

but we don't know if the LSN condition or the state condition is
what is failing.  And if it is the LSN condition, it'd be good
to see the actual last LSN, so we can look for patterns like
whether there is a page boundary crossing involved.  So I suggest
adding something like the attached.

If we do this, I'd be inclined to instrument wait_for_slot_catchup
and wait_for_subscription_sync similarly, but I thought I'd check
for contrary opinions first.

            regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 32ee98aebc..3403626f92 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2948,6 +2948,13 @@ sub wait_for_catchup
         }
         else
         {
+            # Fetch additional detail for debugging purposes
+            $query = qq[SELECT application_name, ${mode}_lsn, state
+                        FROM pg_catalog.pg_stat_replication
+                        WHERE application_name IN ('$standby_name', 'walreceiver')];
+            my $details = $self->safe_psql('postgres', $query);
+            diag qq(Last application_name|${mode}_lsn|state:
+${details});
             croak "timed out waiting for catchup";
         }
     }

pgsql-hackers by date:

From: Masahiko Sawada
Date: 11 August 2024, 22:22:53
Subject: Re: Fix memory counter update in reorderbuffer

From: Richard Guo
Date: 12 August 2024, 03:19:55
Subject: Re: A problem about partitionwise join

Re: Recent 027_streaming_regress.pl hangs - Mailing list pgsql-hackers

Previous

Next