Re: BF animal malleefowl reported an failure in 001_password.pl - Mailing list pgsql-hackers

From Tom Lane
Subject Re: BF animal malleefowl reported an failure in 001_password.pl
Date
Msg-id 934208.1673682937@sss.pgh.pa.us
Whole thread Raw
In response to BF animal malleefowl reported an failure in 001_password.pl  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: BF animal malleefowl reported an failure in 001_password.pl
List pgsql-hackers
"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes:
> I noticed one BF failure[1] when monitoring the BF for some other commit.
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51
> ...
> So it seems the connection happens before pg_ident.conf is actually reloaded ?
> Not sure if we need to do something make sure the reload happen, because it's
> looks like very rare failure which hasn't happen in last 90 days.

That does look like a race condition between config reloading and
new-backend launching.  However, I can't help being suspicious about
the fact that we haven't seen this symptom before and now here it is
barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
event loop).  It seems fairly plausible that that did something that
causes the postmaster to preferentially process connection-accept ahead
of SIGHUP.  I took a quick look through the code and did not see a
smoking gun, but I'm way too tired to be sure I didn't miss something.

In general, use of WaitEventSet instead of signals will tend to slot
the postmaster into non-temporally-ordered event responses in two
ways: (1) the latch.c code will report events happening at more-or-less
the same time in a specific order, and (2) the postmaster.c code will
react to signal-handler-set flags in a specific order.  AFAICS, both
of those code layers will prioritize latch events ahead of
connection-accept events, but did I misread it?

Also it seems like the various platform-specific code paths in latch.c
could diverge as to the priority order of events, which could cause
annoying platform-specific behavior.  Not sure there's much to be
done there other than to be sensitive to not letting such divergence
happen.

            regards, tom lane



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: fixing CREATEROLE
Next
From: Jeff Davis
Date:
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible