Re: Missed condition-variable wakeups on FreeBSD - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Missed condition-variable wakeups on FreeBSD
Date
Msg-id 2398828.1646000688@sss.pgh.pa.us
Whole thread Raw
In response to Re: Missed condition-variable wakeups on FreeBSD  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2022-02-26 14:07:05 -0500, Tom Lane wrote:
>> I have observed this three times in the REL_11 branch, once
>> in REL_12, and a couple of times last summer before it occurred
>> to me to start keeping notes.  Over that time the machine has
>> been running various patchlevels of FreeBSD 13.0.

> It's certainly interesting that it appears to happen only in the branches
> using poll rather than kqueue to implement latches. That changed between 12
> and 13.

Yeah, and there was no PHJ in v10, so that's a pretty good theory as
to why I've only seen it in those two branches.

> Have you tried running the core regression tests with force_parallel_mode =
> on, or with the parallel costs lowered, to see if that makes the problem
> appear more often?
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?
> Besides trying to make the issue more likely as suggested above, it might be
> worth checking if signalling the stuck processes with SIGUSR1 gets them
> unstuck.

I've now wasted a bunch of kilowatt-hours fruitlessly trying to
reproduce this outside the confines of the buildfarm script.
I'm at a loss to figure out what the buildfarm is doing differently,
but apparently there's something.  I'm going to re-enable the
machine's buildfarm job and just wait for it to hang up again.
More info eventually ...

            regards, tom lane



pgsql-hackers by date:

Previous
From: Noboru Saito
Date:
Subject: Re: Separate the result of \watch for each query execution (psql)
Next
From: Jeff Davis
Date:
Subject: Re: Proposal: Support custom authentication methods using hooks