Thread: BF animal malleefowl reported an failure in 001_password.pl
Hi, I noticed one BF failure[1] when monitoring the BF for some other commit. # Failed test 'authentication success for method password, connstr user=scram_role: log matches' # at t/001_password.pl line 120. # '2023-01-13 07:33:46.741 EST [243628:5] LOG: received SIGHUP, reloading configuration files # 2023-01-13 07:33:46.742 EST [243662:1] [unknown] LOG: connection received: host=[local] # 2023-01-13 07:33:46.744 EST [243662:2] [unknown] LOG: connection authorized: user=scram_role database=postgres application_name=001_password.pl # 2023-01-13 07:33:46.748 EST [243662:3] 001_password.pl LOG: statement: SELECT $$connected with user=scram_role$$ # ' # doesn't match '(?^:connection authenticated: identity="scram_role" method=password)' # Looks like you failed 1 test of 79. [08:33:47] t/001_password.pl ........ After checking the test and log, I can see the test failed at the following code: ---- # For plain "password" method, all users should also be able to connect. reset_pg_hba($node, 'all', 'all', 'password'); test_conn($node, 'user=scram_role', 'password', 0, log_like => [qr/connection authenticated: identity="scram_role" method=password/]); ---- From the log, the expected "xxx method=password " log was not output, a simple "connection authorized: user=scram_role database=postgres " was output Instead. So it seems the connection happens before pg_ident.conf is actually reloaded ? Not sure if we need to do something make sure the reload happen, because it's looks like very rare failure which hasn't happen in last 90 days. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51 Best regards, Hou zhijie
"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes: > I noticed one BF failure[1] when monitoring the BF for some other commit. > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51 > ... > So it seems the connection happens before pg_ident.conf is actually reloaded ? > Not sure if we need to do something make sure the reload happen, because it's > looks like very rare failure which hasn't happen in last 90 days. That does look like a race condition between config reloading and new-backend launching. However, I can't help being suspicious about the fact that we haven't seen this symptom before and now here it is barely a day after 7389aad63 (Use WaitEventSet API for postmaster's event loop). It seems fairly plausible that that did something that causes the postmaster to preferentially process connection-accept ahead of SIGHUP. I took a quick look through the code and did not see a smoking gun, but I'm way too tired to be sure I didn't miss something. In general, use of WaitEventSet instead of signals will tend to slot the postmaster into non-temporally-ordered event responses in two ways: (1) the latch.c code will report events happening at more-or-less the same time in a specific order, and (2) the postmaster.c code will react to signal-handler-set flags in a specific order. AFAICS, both of those code layers will prioritize latch events ahead of connection-accept events, but did I misread it? Also it seems like the various platform-specific code paths in latch.c could diverge as to the priority order of events, which could cause annoying platform-specific behavior. Not sure there's much to be done there other than to be sensitive to not letting such divergence happen. regards, tom lane
On Sat, Jan 14, 2023 at 8:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes: > > I noticed one BF failure[1] when monitoring the BF for some other commit. > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51 > > ... > > So it seems the connection happens before pg_ident.conf is actually reloaded ? > > Not sure if we need to do something make sure the reload happen, because it's > > looks like very rare failure which hasn't happen in last 90 days. > > That does look like a race condition between config reloading and > new-backend launching. However, I can't help being suspicious about > the fact that we haven't seen this symptom before and now here it is > barely a day after 7389aad63 (Use WaitEventSet API for postmaster's > event loop). It seems fairly plausible that that did something that > causes the postmaster to preferentially process connection-accept ahead > of SIGHUP. I took a quick look through the code and did not see a > smoking gun, but I'm way too tired to be sure I didn't miss something. Yeah, I guess the scheduling might go something like this: 1. kill() runs and sets SIGHUP as pending in the postmaster process; the postmaster is now runnable but not yet running. 2. Meanwhile connect() starts. 3. postmaster starts running, sees the pending signal and immediately runs the handler, which previously did the actual reload (before doing anything else) but now just sets our reload-pending flag and does kill(self, SIGURG), and then returns, so epoll_wait() is unblocked. 4. epoll_wait() returns, reporting two events: signalfd ready to read (or self-pipe, or EVFILT_SIGNAL), AND connection ready to accept. 5. Connection happens to be reported first so we accept/fork the connection and reload later. I think epoll will report fds in the order they became ready (undocumented, but apparently well known that it's a kind of FIFO linked list), but that itself is indeterminate, as 2 and 3 race. It looks like melleefowl is slow/overloaded (often ~3 hours to run the tests, sometimes ~half and hour and sometimes ~4 hours). Now that I think about it, it's surprising I haven't seen this before though, implying that 3 always beats 2, so maybe I'm missing something else... But if that's the general idea, I suppose there would be two ways to give higher priority to signals/latches that arrive in the same set of events: (1) scan the events array twice (for latches then connections), or (2) check our pending flags every time through the output events loop, at the top, even for connection events (ie just move some lines up a bit). Probably 2 is the way to go (see also discussion about whether we should do that anyway, to give priority to a shutdown request if it arrives while the server is looping over 64 server sockets that are all ready to accept).
On Sat, Jan 14, 2023 at 10:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > But if that's the general idea, I suppose there would be two ways to > give higher priority to signals/latches that arrive in the same set of > events: (1) scan the events array twice (for latches then > connections), or (2) check our pending flags every time through the > output events loop, at the top, even for connection events (ie just > move some lines up a bit). Here's a sketch of the first idea. I also coded up the second idea (basically: when nevents > 1, behave as though the latch has been set every time through the loop, and then also check for WL_SOCKET_ACCEPT), but I'm not sure I like it (it's less clear to read, harder to explain, and I'm also interested in exploring alternative ways to receive signals other than with handlers that set these flags so I'm not sure I like baking in the assumption that we can test the flags without having received a corresponding event). I'm going to be AFK for a day or so and I'd like to see if we can collect some more evidence about this and maybe repro first so I'll wait for a bit.
Attachment
On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Here's a sketch of the first idea. To hit the problem case, the signal needs to arrive in between the latch->is_set check and the epoll_wait() call, and the handler needs to take a while to get started. (If it arrives before the latch->is_set check we report WL_LATCH_SET immediately, and if it arrives after the epoll_wait() call begins, we get EINTR and go back around to the latch->is_set check.) With some carefully placed sleeps to simulate a CPU-starved system (see attached) I managed to get a kill-then-connect sequence to produce: 2023-01-17 10:48:32.508 NZDT [555849] LOG: nevents = 2 2023-01-17 10:48:32.508 NZDT [555849] LOG: events[0] = WL_SOCKET_ACCEPT 2023-01-17 10:48:32.508 NZDT [555849] LOG: events[1] = WL_LATCH_SET 2023-01-17 10:48:32.508 NZDT [555849] LOG: received SIGHUP, reloading configuration files With the patch I posted, we process that in the order we want: 2023-01-17 11:06:31.340 NZDT [562262] LOG: nevents = 2 2023-01-17 11:06:31.340 NZDT [562262] LOG: events[1] = WL_LATCH_SET 2023-01-17 11:06:31.340 NZDT [562262] LOG: received SIGHUP, reloading configuration files 2023-01-17 11:06:31.344 NZDT [562262] LOG: events[0] = WL_SOCKET_ACCEPT Other thoughts: Another idea would be to teach the latch infrastructure itself to magically swap latch events to position 0. Latches are usually prioritised; it's only in this rare race case that they are not. Or going the other way, I realise that we're lacking a "wait for reload" mechanism as discussed in other threads (usually people want this if they care about its effects on backends other than the postmaster, where all bets are off and Andres once suggested the ProcSignalBarrier hammer), and if we ever got something like that it might be another solution to this particular problem.
Attachment
On Tue, Jan 17, 2023 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Another idea would be to teach the latch infrastructure itself to > magically swap latch events to position 0. Latches are usually > prioritised; it's only in this rare race case that they are not. I liked that idea for a while, but I suspect it is not really possible to solve the problem completely this way, because it won't work on Windows (see below) and the race I described earlier is probably not the only one. I think it must also be possible for poll() to ignore a signal that becomes pending just as the system call begins and return a socket fd that has also just become ready, without waiting (thus not causing EINTR). Then the handler would run after we return to userspace, we'd see only the socket event, and a later call would see the latch event. So I think we probably need something like the attached, which I was originally trying to avoid. Looking into all that made me notice a related problem on Windows. There's an interesting difference between the implementation of select() in src/backend/port/win32/socket.c and the Windows implementation of WaitEventSetBlock() in latch.c. The latch.c code only reports one event at a time, in event array order, because that's WaitForMultipleObjects()'s contract and we expose that fairly directly. The older socket.c code uses that only for wakeup, and then it polls *all* sockets to be able to report more than one at a time. I was careful to use a large array of output events to preserve the existing round-robin servicing of multiple server sockets, but I see now that that only works on Unix. On Windows, I suspect that one socket receiving a fast enough stream of new connections could prevent a second socket from ever being serviced. I think we might want to do something about that.
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > So I think we probably need something like the attached, which I was > originally trying to avoid. Yeah, something like that. I also wonder if you don't need to think a bit harder about the ordering of the flag checks, in particular it seems like servicing reload_request before child_exit might be a good idea (remembering that child_exit might cause launching of new children, so we want to be up to speed on relevant settings). regards, tom lane
On Fri, Jan 20, 2023 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > So I think we probably need something like the attached, which I was > > originally trying to avoid. > > Yeah, something like that. I also wonder if you don't need to think > a bit harder about the ordering of the flag checks, in particular > it seems like servicing reload_request before child_exit might be > a good idea (remembering that child_exit might cause launching of > new children, so we want to be up to speed on relevant settings). Agreed, and done.