Re: Recent 027_streaming_regress.pl hangs - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Recent 027_streaming_regress.pl hangs
Date
Msg-id 4230d49c-898d-43e0-8ca9-a64b5b702b1d@dunslane.net
Whole thread Raw
In response to Re: Recent 027_streaming_regress.pl hangs  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Recent 027_streaming_regress.pl hangs
List pgsql-hackers


On 2024-08-11 Su 8:32 PM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
We'll see. I have switched crake from --run-parallel mode to --run-all 
mode i.e. the runs are serialized. Maybe that will be enough to stop the 
errors. I'm still annoyed that this test is susceptible to load, if that 
is indeed what is the issue.
crake is still timing out intermittently on 027_streaming_regress.pl,
so that wasn't it.  I think we need more data.  We know that the
wait_for_catchup query is never getting to true:
	SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'

but we don't know if the LSN condition or the state condition is
what is failing.  And if it is the LSN condition, it'd be good
to see the actual last LSN, so we can look for patterns like
whether there is a page boundary crossing involved.  So I suggest
adding something like the attached.

If we do this, I'd be inclined to instrument wait_for_slot_catchup
and wait_for_subscription_sync similarly, but I thought I'd check
for contrary opinions first.
			


Seems reasonable.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Frédéric Yhuel
Date:
Subject: Re: New GUC autovacuum_max_threshold ?
Next
From: Tom Lane
Date:
Subject: Re: Remove support for old realpath() API