Re: conchuela timeouts since 2021-10-09 system upgrade - Mailing list pgsql-bugs

From Noah Misch
Subject Re: conchuela timeouts since 2021-10-09 system upgrade
Date
Msg-id 20211029115725.GA309057@rfd.leadboat.com
Whole thread Raw
In response to Re: conchuela timeouts since 2021-10-09 system upgrade  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
On Fri, Oct 29, 2021 at 04:42:31PM +1300, Thomas Munro wrote:
> On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It indeed is looking like 7f580aa made the problem go away on conchuela,
> > but do we understand why?

I don't.

> > The only theory I can think of is "kernel bug",
> > but while that's plausible for prairiedog it seems hard to credit for a
> > late-model BSD kernel.

DragonFly BSD is a niche OS, so I'm more willing than usual to conclude that.
Could be a bug in IPC::Run or in the port of Perl to DragonFly, but those feel
less likely than the kernel.  The upgrade from DragonFly v4.4.3 to DragonFly
v6.0.0, which introduced this form of PostgreSQL test breakage, also updated
Perl from v5.20.3 to 5.32.1.

> I have yet to even log into a DBSD system (my attempt to install the
> 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
> do intend to get it working at some point.  But I can offer a poorly
> researched wildly speculative hypothesis: DBSD forked from FBSD in
> 2003.  macOS 10.3 took FBSD's kqueue code in... 2003.  So maybe a bug
> was fixed later that they both inherited?  Or perhaps that makes no
> sense, I dunno.  It'd be nice to try to write a repro and send them a
> report, if we can.

The conchuela bug and the prairiedog bug both present with a timeout in
IPC::Run::finish, but the similarity ends there.  On prairiedog, the
postmaster was stuck when it should have been reading a query from pgbench.
On conchuela, pgbench ran to completion and became a zombie, and IPC::Run got
stuck when it should have been reaping that zombie.  Good thought, however.



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17256: Running pgagent on a custom user
Next
From: PG Bug reporting form
Date:
Subject: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()