Re: Adding locks statistics - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Adding locks statistics
Date
Msg-id acMiT69EEGKHWGM5@paquier.xyz
Whole thread Raw
In response to Re: Adding locks statistics  (Andres Freund <andres@anarazel.de>)
Responses Re: Adding locks statistics
List pgsql-hackers
On Tue, Mar 24, 2026 at 04:09:37PM -0400, Andres Freund wrote:
> The test is extremely unstable on windows.  On CI 10/16 runs since the test in
> failed due to it, afaict.

I am not surprised by that.  Windows is good in catching race
conditions.

> I don't see how a test with a timeout setting that's anywhere remotely close
> to 10ms could be expected to be stable.

Well, the low value of deadlock_timeout is not the problem.  The
shorter the better to make the test go faster with more deadlock
checks.  What the buildfarm is telling is that we do not have the time
to process the deadlock_timeout request, and that we would need to pay
the cost of longer sleep if coded this way.  This is going to cost in
runtime on fast machines where the CheckDeadLock() call would happen
quickly. And on slow machines, we don't have the guarantee that the
sleep would be long enough to process the deadlock request.

This test would be better if rewritten as a TAP test, I guess, with a
NOTICE injection point before the CheckDeadLock() call in ProcSleep()
to make sure that the second session processes the deadlock timeout
request while it is waiting on its lock to be acquired.  One trickier
part is that we only care about the deadlock_timeout in s2, because we
want to measure the wait it has waited until the lock could be
acquired, meaning that we should make s1 use a large deadlock_timeout
to avoid interferences with a global injpoint.

I don't have the credits to test that in the CI for this month,
unfortunately, and this creates noise in the CI for the work of other
folks in this release cycle, so I am going to remove this test for
now.

> Also, anything that requires short sleeps (like pg_sleep(0.05);) is extremely
> likely to be a long time test stability hazard. It's a huge "test smell" to
> me, to the point that I think every single sleep in a test needs a comment
> explaining why that one use of sleep is correct, and that comment better be
> signed in blood.

Yep, agreed.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Jeff Davis
Date:
Subject: Re: Use correct collation in pg_trgm