Thread: BF animal malleefowl reported an failure in 001_password.pl

BF animal malleefowl reported an failure in 001_password.pl

From

"houzj.fnst@fujitsu.com"

Date:

13 January 2023, 13:55:31

Hi,

I noticed one BF failure[1] when monitoring the BF for some other commit.

#   Failed test 'authentication success for method password, connstr user=scram_role: log matches'
#   at t/001_password.pl line 120.
#                   '2023-01-13 07:33:46.741 EST [243628:5] LOG:  received SIGHUP, reloading configuration files
# 2023-01-13 07:33:46.742 EST [243662:1] [unknown] LOG:  connection received: host=[local]
# 2023-01-13 07:33:46.744 EST [243662:2] [unknown] LOG:  connection authorized: user=scram_role database=postgres
application_name=001_password.pl
# 2023-01-13 07:33:46.748 EST [243662:3] 001_password.pl LOG:  statement: SELECT $$connected with user=scram_role$$
# '
#     doesn't match '(?^:connection authenticated: identity="scram_role" method=password)'
# Looks like you failed 1 test of 79.
[08:33:47] t/001_password.pl ........

After checking the test and log, I can see the test failed at the following code:
----
# For plain "password" method, all users should also be able to connect.
reset_pg_hba($node, 'all', 'all', 'password');
test_conn($node, 'user=scram_role', 'password', 0,
    log_like =>
      [qr/connection authenticated: identity="scram_role" method=password/]);
----

From the log, the expected "xxx method=password " log was not output, a simple
"connection authorized: user=scram_role database=postgres " was output Instead.
So it seems the connection happens before pg_ident.conf is actually reloaded ?
Not sure if we need to do something make sure the reload happen, because it's
looks like very rare failure which hasn't happen in last 90 days.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51

Best regards,
Hou zhijie

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Tom Lane

Date:

14 January 2023, 07:55:37

"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes:
> I noticed one BF failure[1] when monitoring the BF for some other commit.
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51
> ...
> So it seems the connection happens before pg_ident.conf is actually reloaded ?
> Not sure if we need to do something make sure the reload happen, because it's
> looks like very rare failure which hasn't happen in last 90 days.

That does look like a race condition between config reloading and
new-backend launching.  However, I can't help being suspicious about
the fact that we haven't seen this symptom before and now here it is
barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
event loop).  It seems fairly plausible that that did something that
causes the postmaster to preferentially process connection-accept ahead
of SIGHUP.  I took a quick look through the code and did not see a
smoking gun, but I'm way too tired to be sure I didn't miss something.

In general, use of WaitEventSet instead of signals will tend to slot
the postmaster into non-temporally-ordered event responses in two
ways: (1) the latch.c code will report events happening at more-or-less
the same time in a specific order, and (2) the postmaster.c code will
react to signal-handler-set flags in a specific order.  AFAICS, both
of those code layers will prioritize latch events ahead of
connection-accept events, but did I misread it?

Also it seems like the various platform-specific code paths in latch.c
could diverge as to the priority order of events, which could cause
annoying platform-specific behavior.  Not sure there's much to be
done there other than to be sensitive to not letting such divergence
happen.

            regards, tom lane

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Thomas Munro

Date:

14 January 2023, 09:29:42

On Sat, Jan 14, 2023 at 8:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes:
> > I noticed one BF failure[1] when monitoring the BF for some other commit.
> > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51
> > ...
> > So it seems the connection happens before pg_ident.conf is actually reloaded ?
> > Not sure if we need to do something make sure the reload happen, because it's
> > looks like very rare failure which hasn't happen in last 90 days.
>
> That does look like a race condition between config reloading and
> new-backend launching.  However, I can't help being suspicious about
> the fact that we haven't seen this symptom before and now here it is
> barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
> event loop).  It seems fairly plausible that that did something that
> causes the postmaster to preferentially process connection-accept ahead
> of SIGHUP.  I took a quick look through the code and did not see a
> smoking gun, but I'm way too tired to be sure I didn't miss something.

Yeah, I guess the scheduling might go something like this:

1.  kill() runs and sets SIGHUP as pending in the postmaster process;
the postmaster is now runnable but not yet running.
2.  Meanwhile connect() starts.
3.  postmaster starts running, sees the pending signal and immediately
runs the handler, which previously did the actual reload (before doing
anything else) but now just sets our reload-pending flag and does
kill(self, SIGURG), and then returns, so epoll_wait() is unblocked.
4.  epoll_wait() returns, reporting two events: signalfd ready to read
(or self-pipe, or EVFILT_SIGNAL), AND connection ready to accept.
5.  Connection happens to be reported first so we accept/fork the
connection and reload later.

I think epoll will report fds in the order they became ready
(undocumented, but apparently well known that it's a kind of FIFO
linked list), but that itself is indeterminate, as 2 and 3 race.  It
looks like melleefowl is slow/overloaded (often ~3 hours to run the
tests, sometimes ~half and hour and sometimes ~4 hours).  Now that I
think about it, it's surprising I haven't seen this before though,
implying that 3 always beats 2, so maybe I'm missing something else...

But if that's the general idea, I suppose there would be two ways to
give higher priority to signals/latches that arrive in the same set of
events: (1) scan the events array twice (for latches then
connections), or (2) check our pending flags every time through the
output events loop, at the top, even for connection events (ie just
move some lines up a bit).  Probably 2 is the way to go (see also
discussion about whether we should do that anyway, to give priority to
a shutdown request if it arrives while the server is looping over 64
server sockets that are all ready to accept).

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Thomas Munro

Date:

14 January 2023, 11:35:43

On Sat, Jan 14, 2023 at 10:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> But if that's the general idea, I suppose there would be two ways to
> give higher priority to signals/latches that arrive in the same set of
> events: (1) scan the events array twice (for latches then
> connections), or (2) check our pending flags every time through the
> output events loop, at the top, even for connection events (ie just
> move some lines up a bit).

Here's a sketch of the first idea.  I also coded up the second idea
(basically: when nevents > 1, behave as though the latch has been set
every time through the loop, and then also check for
WL_SOCKET_ACCEPT), but I'm not sure I like it (it's less clear to
read, harder to explain, and I'm also interested in exploring
alternative ways to receive signals other than with handlers that set
these flags so I'm not sure I like baking in the assumption that we
can test the flags without having received a corresponding event).
I'm going to be AFK for a day or so and I'd like to see if we can
collect some more evidence about this and maybe repro first so I'll
wait for a bit.

Attachment

0001-Prioritize-latches-over-connections-in-the-postmaste.patch

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Thomas Munro

Date:

16 January 2023, 22:24:23

On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's a sketch of the first idea.

To hit the problem case, the signal needs to arrive in between the
latch->is_set check and the epoll_wait() call, and the handler needs
to take a while to get started.  (If it arrives before the
latch->is_set check we report WL_LATCH_SET immediately, and if it
arrives after the epoll_wait() call begins, we get EINTR and go back
around to the latch->is_set check.)  With some carefully placed sleeps
to simulate a CPU-starved system (see attached) I managed to get a
kill-then-connect sequence to produce:

2023-01-17 10:48:32.508 NZDT [555849] LOG:  nevents = 2
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[0] = WL_SOCKET_ACCEPT
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[1] = WL_LATCH_SET
2023-01-17 10:48:32.508 NZDT [555849] LOG:  received SIGHUP, reloading
configuration files

With the patch I posted, we process that in the order we want:

2023-01-17 11:06:31.340 NZDT [562262] LOG:  nevents = 2
2023-01-17 11:06:31.340 NZDT [562262] LOG:  events[1] = WL_LATCH_SET
2023-01-17 11:06:31.340 NZDT [562262] LOG:  received SIGHUP, reloading
configuration files
2023-01-17 11:06:31.344 NZDT [562262] LOG:  events[0] = WL_SOCKET_ACCEPT

Other thoughts:

Another idea would be to teach the latch infrastructure itself to
magically swap latch events to position 0.  Latches are usually
prioritised; it's only in this rare race case that they are not.

Or going the other way, I realise that we're lacking a "wait for
reload" mechanism as discussed in other threads (usually people want
this if they care about its effects on backends other than the
postmaster, where all bets are off and Andres once suggested the
ProcSignalBarrier hammer), and if we ever got something like that it
might be another solution to this particular problem.

Attachment

x.patch

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Thomas Munro

Date:

19 January 2023, 18:32:39

On Tue, Jan 17, 2023 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Another idea would be to teach the latch infrastructure itself to
> magically swap latch events to position 0.  Latches are usually
> prioritised; it's only in this rare race case that they are not.

I liked that idea for a while, but I suspect it is not really possible
to solve the problem completely this way, because it won't work on
Windows (see below) and the race I described earlier is probably not
the only one.  I think it must also be possible for poll() to ignore a
signal that becomes pending just as the system call begins and return
a socket fd that has also just become ready, without waiting (thus not
causing EINTR).  Then the handler would run after we return to
userspace, we'd see only the socket event, and a later call would see
the latch event.

So I think we probably need something like the attached, which I was
originally trying to avoid.

Looking into all that made me notice a related problem on Windows.
There's an interesting difference between the implementation of
select() in src/backend/port/win32/socket.c and the Windows
implementation of WaitEventSetBlock() in latch.c.  The latch.c code
only reports one event at a time, in event array order, because that's
WaitForMultipleObjects()'s contract and we expose that fairly
directly.  The older socket.c code uses that only for wakeup, and then
it polls *all* sockets to be able to report more than one at a time.
I was careful to use a large array of output events to preserve the
existing round-robin servicing of multiple server sockets, but I see
now that that only works on Unix.  On Windows, I suspect that one
socket receiving a fast enough stream of new connections could prevent
a second socket from ever being serviced.  I think we might want to do
something about that.

Attachment

0001-Process-pending-postmaster-work-before-connections.patch

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Tom Lane

Date:

19 January 2023, 20:16:00

Thomas Munro <thomas.munro@gmail.com> writes:
> So I think we probably need something like the attached, which I was
> originally trying to avoid.

Yeah, something like that.  I also wonder if you don't need to think
a bit harder about the ordering of the flag checks, in particular
it seems like servicing reload_request before child_exit might be
a good idea (remembering that child_exit might cause launching of
new children, so we want to be up to speed on relevant settings).

            regards, tom lane

Re: BF animal malleefowl reported an failure in 001_password.pl

From

Thomas Munro

Date:

25 January 2023, 02:09:39

On Fri, Jan 20, 2023 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > So I think we probably need something like the attached, which I was
> > originally trying to avoid.
>
> Yeah, something like that.  I also wonder if you don't need to think
> a bit harder about the ordering of the flag checks, in particular
> it seems like servicing reload_request before child_exit might be
> a good idea (remembering that child_exit might cause launching of
> new children, so we want to be up to speed on relevant settings).

Agreed, and done.