[HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Tom Lane
Subject [HACKERS] Unportable implementation of background worker start
Date
Msg-id 1034.1492639447@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Unportable implementation of background worker start  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
I chanced to notice that on gaur/pademelon, the "select_parallel"
regression test sometimes takes a great deal longer than normal,
for no obvious reason.  It does eventually terminate, but sometimes
only after 10 to 20 minutes rather than the couple dozen seconds
that are typical on that slow machine.

After a fair amount of hair-pulling, I traced the problem to the
fact that maybe_start_bgworker() will only start at most one worker
per call; after that, it sets StartWorkerNeeded = true and returns,
opining in a comment that ServerLoop will make another call shortly.

Unfortunately, that's hogwash.  It happens to work reliably on Linux,
because according to signal(7)
      The following interfaces are never restarted after being interrupted by      a signal handler, regardless of the
useof SA_RESTART; they always fail      with the error EINTR when interrupted by a signal handler: 
          ... select(2) ...

However, that's a Linux-ism.  What POSIX 2008 says about it, in the
select(2) reference page, is that
If SA_RESTART has been set for the interrupting signal, it isimplementation-defined whether the function restarts or
returnswith[EINTR]. 

HPUX apparently adopts the "restart" definition, and as we've previously
found out, not only does select(2) not return immediately but it actually
seems to reset the timeout timer to its original value.  (Some googling
suggests that restarting occurs on SVR4-derived kernels but not
BSD-derived ones.)

So what happens, if several RegisterDynamicBackgroundWorker requests
arrive in a single iteration of the postmaster's sigusr1_handler,
is that the first one is serviced thanks to the maybe_start_bgworker
call appearing in sigusr1_handler, and then we return to the select()
call and sleep.  The next start request is serviced only when the
typically-60-second select timeout expires, or more usually when some
other interrupt arrives at the postmaster.  In the regression tests,
what seems to happen is that we get a PMSIGNAL_START_AUTOVAC_WORKER
from the autovac launcher every thirty seconds, allowing one more bgworker
to get launched each time we go through sigusr1_handler.

Here's an actual trace of one select_parallel.sql query trying to
launch four parallel workers; I added a bunch of elog(LOG) calls
to help diagnose what was going on:

2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 1
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 2
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 3
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 4
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:25:33.563 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:25:33.563 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:25:33.566 EDT [8871] LOG:  starting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] LOG:  exiting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:25:33.647 EDT [6456] LOG:  entering reaper
2017-04-19 17:25:33.647 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.647 EDT [6456] LOG:  reaped bgworker pid 8871 status 0
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:03.114 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:26:03.115 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:26:03.115 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:26:03.118 EDT [8874] LOG:  starting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] LOG:  exiting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:26:03.185 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:26:03.185 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:03.186 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:26:03.186 EDT [6456] LOG:  reaped bgworker pid 8874 status 0
2017-04-19 17:26:03.186 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:03.284 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:03.284 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:33.378 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:26:33.378 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:26:33.378 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:26:33.382 EDT [8876] LOG:  starting parallel worker 1
2017-04-19 17:26:33.428 EDT [8876] LOG:  exiting parallel worker 1
2017-04-19 17:26:33.428 EDT [8876] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:26:33.452 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:26:33.453 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:33.453 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:26:33.453 EDT [6456] LOG:  reaped bgworker pid 8876 status 0
2017-04-19 17:26:33.453 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:33.560 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:33.560 EDT [6456] LOG:  leaving reaper
2017-04-19 17:27:03.114 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:27:03.115 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:27:03.115 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:27:03.118 EDT [8879] LOG:  starting parallel worker 0
2017-04-19 17:27:03.167 EDT [8879] LOG:  exiting parallel worker 0
2017-04-19 17:27:03.167 EDT [8879] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
2017-04-19 17:27:03.174 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:27:03.174 EDT [6456] LOG:  entering reaper
2017-04-19 17:27:03.175 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:27:03.184 EDT [6456] LOG:  reaped bgworker pid 8879 status 0
2017-04-19 17:27:03.184 EDT [6456] LOG:  leaving reaper

While I haven't yet tested it, it seems like a fix might be as simple
as deleting these lines in maybe_start_bgworker:
           /*            * Have ServerLoop call us again.  Note that there might not            * actually *be* another
runnableworker, but we don't care all            * that much; we will find out the next time we run.            */
    StartWorkerNeeded = true;           return; 

So I'm wondering what the design rationale was for only starting one
bgworker per invocation.

It also appears to me that do_start_bgworker's treatment of fork
failure is completely brain dead.  Did anyone really think about
that case?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [HACKERS] Highly Variable Planning Times
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] Highly Variable Planning Times