Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Unportable implementation of background worker start
Date
Msg-id 20170421172044.zcyslu7kcu7tfrsv@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.  It adds at least
> a couple of kernel calls per postmaster signal delivery, and probably to
> every postmaster connection acceptance (ServerLoop iteration), to fix
> problems that are so corner-casey that we've never even known we had them
> till now.

I'm not concerned much about the signal delivery paths, and I can't
quite imagine that another syscall in the accept path is going to be
measurable - worth ensuring though.

I do agree that it's a bit of a big stick for the back-branches...


> A different line of thought is to try to provide a bulletproof solution,
> but push the implementation problems down into latch.c --- that is, the
> goal would be to provide a pselect-like variant of WaitEventSetWait that
> is guaranteed to return if interrupted, as opposed to the current behavior
> where it's guaranteed not to.  But that seems like quite a bit of work.

Seems like a sane idea to me.  The use of latches has already grown due
to parallelism and it'll likely grow further - some of them seem likely
to also have to deal with such concerns.  I'd much rather centralize
things down to a common place.

On the other hand most types of our processes do SetLatch() in just
nearly all the relevant signal handlers anyway, so they're pretty close
to behaviour already.



> Whether or not we decide to change over the postmaster.c code, I think
> it'd likely be a good idea to apply most or all of the attached changes
> in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
> and the other changes will provide some safety if some preloaded extension
> decides to create a latch in the postmaster process.

On the principle, I agree.  Reading through the changes now.


> @@ -1667,76 +1656,64 @@ ServerLoop(void)
>           * do nontrivial work.
>           *
>           * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
> -         * any new connections, so we don't call select(), and just sleep.
> +         * any new connections, so we don't call WaitEventSetWait(), and just
> +         * sleep.  XXX not ideal
>           */

Couldn't we just deactive the sockets in the set instead?


>  /*
> - * Initialise the masks for select() for the ports we are listening on.
> - * Return the number of sockets to listen on.
> + * Create a WaitEventSet for ServerLoop() to wait on.  This includes the
> + * sockets we are listening on, plus the PostmasterLatch.
>   */
> -static int
> -initMasks(fd_set *rmask)
> +static void
> +initServerWaitSet(void)
>  {
> -    int            maxsock = -1;
>      int            i;
>  
> -    FD_ZERO(rmask);
> +    ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);

Why are we using MAXLISTEN, rather than the actual number of things to
listen to?  The only benefit of this seems to be that we could
theoretically allow dynamic reconfiguration of the sockets a bit more
easily in the future, but that could just as well be done by recreating the set.

Random note: Do we actually have any code that errors out if too many
sockets are being listened to?


> @@ -2553,6 +2543,9 @@ SIGHUP_handler(SIGNAL_ARGS)
>  #endif
>      }
>  
> +    /* Force ServerLoop to iterate */
> +    SetLatch(&PostmasterLatch);
> +
>      PG_SETMASK(&UnBlockSig);
>  
>      errno = save_errno;
> @@ -2724,6 +2717,9 @@ pmdie(SIGNAL_ARGS)
>              break;
>      }
>  
> +    /* Force ServerLoop to iterate */
> +    SetLatch(&PostmasterLatch);
> +
>      PG_SETMASK(&UnBlockSig);
>  
>      errno = save_errno;
> @@ -3037,6 +3033,9 @@ reaper(SIGNAL_ARGS)
>       */
>      PostmasterStateMachine();
>  
> +    /* Force ServerLoop to iterate */
> +    SetLatch(&PostmasterLatch);
> +
>      /* Done with signal handler */
>      PG_SETMASK(&UnBlockSig);
>  
> @@ -5078,6 +5077,9 @@ sigusr1_handler(SIGNAL_ARGS)
>          signal_child(StartupPID, SIGUSR2);
>      }
>  
> +    /* Force ServerLoop to iterate */
> +    SetLatch(&PostmasterLatch);
> +
>      PG_SETMASK(&UnBlockSig);
>  
>      errno = save_errno;

I kind of would like, in master, take a chance of replace all the work
done in signal handlers, by just a SetLatch(), and do it outside of
signal handlers instead.  Forking from signal handlers is just plain
weird.



> diff --git a/src/backend/storage/ipc/latchindex 4798370..92d2ff0 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -62,6 +62,10 @@
>  #include "storage/pmsignal.h"
>  #include "storage/shmem.h"
>  
> +#ifndef FD_CLOEXEC
> +#define FD_CLOEXEC 1
> +#endif

Hm? Are we sure this is portable?  Is there really cases that have
F_SETFD, but not CLOEXEC?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrew Borodin
Date:
Subject: Re: [HACKERS] Range Merge Join v1
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Unportable implementation of background worker start