Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Unportable implementation of background worker start
Date
Msg-id 9205.1492833041@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Unportable implementation of background worker start  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Unportable implementation of background worker start  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.

Attached are a couple of patches that represent a plausible Plan B.
The first one changes the postmaster to run its signal handlers without
specifying SA_RESTART.  I've confirmed that that seems to fix the
select_parallel-test-takes-a-long-time problem on gaur/pademelon.
The second one uses pselect, if available, to replace the unblock-signals/
select()/block-signals dance in ServerLoop.  On platforms where pselect
exists and works properly, that should fix the race condition I described
previously.  On platforms where it doesn't, we're no worse off than
before.

As mentioned in the comments for the second patch, even if we don't
have working pselect(), the only problem is that ServerLoop's response
to an interrupt might be delayed by as much as the up-to-1-minute timeout.
The only existing case where that's really bad is launching multiple
bgworkers.  I would therefore advocate also changing maybe_start_bgworker
to start up to N bgworkers per call, where N is large enough to pretty
much always satisfy simultaneously-arriving requests.  I'd pick 100 or
so, but am willing to negotiate.

I think that these patches represent something we could back-patch
without a lot of trepidation, unlike the WaitEventSet-based approach.
Therefore, my proposal is to apply and backpatch these changes, and
call it good for v10.  For v11, we could work on changing the postmaster
to not do work in signal handlers, as discussed upthread.  That would
supersede these two patches completely, though I'd still advocate for
keeping the change in maybe_start_bgworker.

Note: for testing purposes, these patches are quite independent; just
ignore the hunk in the second patch that changes a comment added by
the first one.

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c382345..52b5e2c 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 610,615 ****
--- 610,624 ----
      /*
       * Set up signal handlers for the postmaster process.
       *
+      * In the postmaster, we want to install non-ignored handlers *without*
+      * SA_RESTART.  This is because they'll be blocked at all times except
+      * when ServerLoop is waiting for something to happen, and during that
+      * window, we want signals to exit the select(2) wait so that ServerLoop
+      * can respond if anything interesting happened.  On some platforms,
+      * signals marked SA_RESTART would not cause the select() wait to end.
+      * Child processes will generally want SA_RESTART, but we expect them to
+      * set up their own handlers before unblocking signals.
+      *
       * CAUTION: when changing this list, check for side-effects on the signal
       * handling setup of child processes.  See tcop/postgres.c,
       * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
*************** PostmasterMain(int argc, char *argv[])
*** 620,635 ****
      pqinitmask();
      PG_SETMASK(&BlockSig);

!     pqsignal(SIGHUP, SIGHUP_handler);    /* reread config file and have
!                                          * children do same */
!     pqsignal(SIGINT, pmdie);    /* send SIGTERM and shut down */
!     pqsignal(SIGQUIT, pmdie);    /* send SIGQUIT and die */
!     pqsignal(SIGTERM, pmdie);    /* wait for children and shut down */
      pqsignal(SIGALRM, SIG_IGN); /* ignored */
      pqsignal(SIGPIPE, SIG_IGN); /* ignored */
!     pqsignal(SIGUSR1, sigusr1_handler); /* message from child process */
!     pqsignal(SIGUSR2, dummy_handler);    /* unused, reserve for children */
!     pqsignal(SIGCHLD, reaper);    /* handle child termination */
      pqsignal(SIGTTIN, SIG_IGN); /* ignored */
      pqsignal(SIGTTOU, SIG_IGN); /* ignored */
      /* ignore SIGXFSZ, so that ulimit violations work like disk full */
--- 629,648 ----
      pqinitmask();
      PG_SETMASK(&BlockSig);

!     pqsignal_no_restart(SIGHUP, SIGHUP_handler);        /* reread config file
!                                                          * and have children do
!                                                          * same */
!     pqsignal_no_restart(SIGINT, pmdie); /* send SIGTERM and shut down */
!     pqsignal_no_restart(SIGQUIT, pmdie);        /* send SIGQUIT and die */
!     pqsignal_no_restart(SIGTERM, pmdie);        /* wait for children and shut
!                                                  * down */
      pqsignal(SIGALRM, SIG_IGN); /* ignored */
      pqsignal(SIGPIPE, SIG_IGN); /* ignored */
!     pqsignal_no_restart(SIGUSR1, sigusr1_handler);        /* message from child
!                                                          * process */
!     pqsignal_no_restart(SIGUSR2, dummy_handler);        /* unused, reserve for
!                                                          * children */
!     pqsignal_no_restart(SIGCHLD, reaper);        /* handle child termination */
      pqsignal(SIGTTIN, SIG_IGN); /* ignored */
      pqsignal(SIGTTOU, SIG_IGN); /* ignored */
      /* ignore SIGXFSZ, so that ulimit violations work like disk full */
diff --git a/src/include/port.h b/src/include/port.h
index c6937e5..52910ed 100644
*** a/src/include/port.h
--- b/src/include/port.h
*************** extern int    pg_mkdir_p(char *path, int om
*** 469,474 ****
--- 469,479 ----
  /* port/pqsignal.c */
  typedef void (*pqsigfunc) (int signo);
  extern pqsigfunc pqsignal(int signo, pqsigfunc func);
+ #ifndef WIN32
+ extern pqsigfunc pqsignal_no_restart(int signo, pqsigfunc func);
+ #else
+ #define pqsignal_no_restart(signo, func) pqsignal(signo, func)
+ #endif

  /* port/quotes.c */
  extern char *escape_single_quotes_ascii(const char *src);
diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index 5d366da..e7e4451 100644
*** a/src/port/pqsignal.c
--- b/src/port/pqsignal.c
***************
*** 32,38 ****
  #if !defined(WIN32) || defined(FRONTEND)

  /*
!  * Set up a signal handler for signal "signo"
   *
   * Returns the previous handler.
   */
--- 32,38 ----
  #if !defined(WIN32) || defined(FRONTEND)

  /*
!  * Set up a signal handler, with SA_RESTART, for signal "signo"
   *
   * Returns the previous handler.
   */
*************** pqsignal(int signo, pqsigfunc func)
*** 58,61 ****
--- 58,90 ----
  #endif
  }

+ /*
+  * Set up a signal handler, without SA_RESTART, for signal "signo"
+  *
+  * Returns the previous handler.
+  *
+  * On Windows, this would be identical to pqsignal(), so don't bother.
+  */
+ #ifndef WIN32
+
+ pqsigfunc
+ pqsignal_no_restart(int signo, pqsigfunc func)
+ {
+     struct sigaction act,
+                 oact;
+
+     act.sa_handler = func;
+     sigemptyset(&act.sa_mask);
+     act.sa_flags = 0;
+ #ifdef SA_NOCLDSTOP
+     if (signo == SIGCHLD)
+         act.sa_flags |= SA_NOCLDSTOP;
+ #endif
+     if (sigaction(signo, &act, &oact) < 0)
+         return SIG_ERR;
+     return oact.sa_handler;
+ }
+
+ #endif   /* !WIN32 */
+
  #endif   /* !defined(WIN32) || defined(FRONTEND) */
diff --git a/configure b/configure
index 99d05bf..42ff097 100755
*** a/configure
--- b/configure
*************** fi
*** 12892,12898 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 12892,12898 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pselect
pstatpthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes
wcstombswcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index c36c503..871edd8 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1429,1435 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l])

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1429,1435 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pselect
pstatpthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes
wcstombswcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 52b5e2c..382e6a8 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 613,621 ****
       * In the postmaster, we want to install non-ignored handlers *without*
       * SA_RESTART.  This is because they'll be blocked at all times except
       * when ServerLoop is waiting for something to happen, and during that
!      * window, we want signals to exit the select(2) wait so that ServerLoop
       * can respond if anything interesting happened.  On some platforms,
!      * signals marked SA_RESTART would not cause the select() wait to end.
       * Child processes will generally want SA_RESTART, but we expect them to
       * set up their own handlers before unblocking signals.
       *
--- 613,621 ----
       * In the postmaster, we want to install non-ignored handlers *without*
       * SA_RESTART.  This is because they'll be blocked at all times except
       * when ServerLoop is waiting for something to happen, and during that
!      * window, we want signals to exit the pselect(2) wait so that ServerLoop
       * can respond if anything interesting happened.  On some platforms,
!      * signals marked SA_RESTART would not cause the pselect() wait to end.
       * Child processes will generally want SA_RESTART, but we expect them to
       * set up their own handlers before unblocking signals.
       *
*************** ServerLoop(void)
*** 1669,1674 ****
--- 1669,1676 ----
      for (;;)
      {
          fd_set        rmask;
+         fd_set       *rmask_p;
+         struct timeval timeout;
          int            selres;
          time_t        now;

*************** ServerLoop(void)
*** 1678,1714 ****
           * We block all signals except while sleeping. That makes it safe for
           * signal handlers, which again block all signals while executing, to
           * do nontrivial work.
-          *
-          * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
-          * any new connections, so we don't call select(), and just sleep.
           */
-         memcpy((char *) &rmask, (char *) &readmask, sizeof(fd_set));
-
          if (pmState == PM_WAIT_DEAD_END)
          {
!             PG_SETMASK(&UnBlockSig);
!
!             pg_usleep(100000L); /* 100 msec seems reasonable */
!             selres = 0;
!
!             PG_SETMASK(&BlockSig);
          }
          else
          {
!             /* must set timeout each time; some OSes change it! */
!             struct timeval timeout;

              /* Needs to run with blocked signals! */
              DetermineSleepTime(&timeout);

              PG_SETMASK(&UnBlockSig);

!             selres = select(nSockets, &rmask, NULL, NULL, &timeout);

              PG_SETMASK(&BlockSig);
          }

!         /* Now check the select() result */
          if (selres < 0)
          {
              if (errno != EINTR && errno != EWOULDBLOCK)
--- 1680,1743 ----
           * We block all signals except while sleeping. That makes it safe for
           * signal handlers, which again block all signals while executing, to
           * do nontrivial work.
           */
          if (pmState == PM_WAIT_DEAD_END)
          {
!             /*
!              * If we are in PM_WAIT_DEAD_END state, then we don't want to
!              * accept any new connections, so pass a null rmask.
!              */
!             rmask_p = NULL;
!             timeout.tv_sec = 0;
!             timeout.tv_usec = 100000;    /* 100 msec seems reasonable */
          }
          else
          {
!             /* Normal case: check sockets, and compute a suitable timeout */
!             memcpy(&rmask, &readmask, sizeof(fd_set));
!             rmask_p = &rmask;

              /* Needs to run with blocked signals! */
              DetermineSleepTime(&timeout);
+         }

+         /*
+          * We prefer to wait with pselect(2) if available, as using that,
+          * together with *not* using SA_RESTART for signals, guarantees that
+          * we will get kicked off the wait if a signal occurs.
+          *
+          * If we lack pselect(2), fake it with select(2).  This has a race
+          * condition: a signal that was already pending will be delivered
+          * before we reach the select(), and therefore the select() will wait,
+          * even though we might wish to do something in response.  Therefore,
+          * beware of putting any time-critical signal response logic into
+          * ServerLoop rather than into the signal handler itself.  It will run
+          * eventually, but maybe not till after a timeout delay.
+          *
+          * It is rumored that some implementations of pselect() are not
+          * atomic, making the first alternative here functionally equivalent
+          * to the second.  Not much we can do about that though.
+          */
+         {
+ #ifdef HAVE_PSELECT
+             /* pselect uses a randomly different timeout API, sigh */
+             struct timespec ptimeout;
+
+             ptimeout.tv_sec = timeout.tv_sec;
+             ptimeout.tv_nsec = timeout.tv_usec * 1000;
+
+             selres = pselect(nSockets, rmask_p, NULL, NULL,
+                              &ptimeout, &UnBlockSig);
+ #else
              PG_SETMASK(&UnBlockSig);

!             selres = select(nSockets, rmask_p, NULL, NULL, &timeout);

              PG_SETMASK(&BlockSig);
+ #endif
          }

!         /* Now check the select()/pselect() result */
          if (selres < 0)
          {
              if (errno != EINTR && errno != EWOULDBLOCK)
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 03e9803..9bbf1b1 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 400,405 ****
--- 400,408 ----
  /* Define to 1 if the assembler supports PPC's LWARX mutex hint bit. */
  #undef HAVE_PPC_LWARX_MUTEX_HINT

+ /* Define to 1 if you have the `pselect' function. */
+ #undef HAVE_PSELECT
+
  /* Define to 1 if you have the `pstat' function. */
  #undef HAVE_PSTAT

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index f9588b0..5d36768 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 267,272 ****
--- 267,275 ----
  /* Define to 1 if you have the <poll.h> header file. */
  /* #undef HAVE_POLL_H */

+ /* Define to 1 if you have the `pselect' function. */
+ /* #undef HAVE_PSELECT */
+
  /* Define to 1 if you have the `pstat' function. */
  /* #undef HAVE_PSTAT */


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Old versions of Test::More
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] pgbench - allow to store select results intovariables