Re: Using WaitEventSet in the postmaster - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Using WaitEventSet in the postmaster
Date
Msg-id 20230107225548.ppzg4rpninlthgth@awork3.anarazel.de
Whole thread Raw
In response to Re: Using WaitEventSet in the postmaster  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Using WaitEventSet in the postmaster
List pgsql-hackers
Hi,

On 2023-01-07 18:08:11 +1300, Thomas Munro wrote:
> On Sat, Jan 7, 2023 at 12:25 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:
> > > 3.  Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
> > > SIGINT?  If you send all of these extremely rapidly, it's
> > > indeterminate which one will be seen by handle_shutdown_request().
> >
> > That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
> > shutdown request.
> 
> I was contemplating whether I needed to do some more push-ups to
> prefer the first delivered signal (instead of the last), but you're
> saying that it would be enough to prefer the fastest shutdown type, in
> cases where more than one signal was handled between server loops.
> WFM.

I don't see any need for such an order requirement - in case of receiving a
"less severe" shutdown request first, we'd process the more severe one soon
after. There's nothing to be gained by trying to follow the order of the
incoming signals.

Afaict that's also the behaviour today. pmdie() has blocks like this:
        case SIGTERM:

            /*
             * Smart Shutdown:
             *
             * Wait for children to end their work, then shut down.
             */
            if (Shutdown >= SmartShutdown)
                break;



I briefly wondered about deduplicating that code, now that we we know the
requested mode ahead of the switch. So it could be a something like:

/* don't interrupt an already in-progress shutdown in a more "severe" mode */
if (mode < Shutdown)
   return;

but that's again probaly something for later.


> > > 5.  Is the signal mask being correctly handled during forking?  I
> > > think so: I decided to push the masking logic directly into the
> > > routine that forks, to make it easy to verify that all paths set the
> > > mask the way we want.
> >
> > Hm. If I understand correctly, you used sigprocmask() directly (vs
> > PG_SETMASK()) in fork_process() because you want the old mask? But why do we
> > restore the prior mask, instead of just using PG_SETMASK(&UnBlockSig); as we
> > still do in a bunch of places in the postmaster?
> 
> It makes zero difference in practice but I think it's a nicer way to
> code it because it doesn't make an unnecessary assumption about what
> the signal mask was on entry.

Heh, to me doing something slightly different than in other places actually
seems to make it a bit less nice :). There's also some benefit in the
certainty of knowing what the mask will be.  But it doesn't matter mcuh.


> > > 6.  Is the naming and style OK?  Better ideas welcome, but basically I
> > > tried to avoid all unnecessary refactoring and changes, so no real
> > > logic moves around, and the changes are pretty close to "mechanical".
> > > One bikeshed decision was what to call the {handle,process}_XXX
> > > functions and associated flags.
> >
> > I wonder if it'd be good to have a _pm_ in the name.
> 
> I dunno about this one, it's all static stuff in a file called
> postmaster.c and one (now) already has pm in it (see below).

I guess stuff like signal handlers and their state somehow seems more global
to me than their C linkage type suggests. Hence the desire to be a bit more
"namespaced" in their naming. I do find it somewhat annoying when reasonably
important global variables aren't uniquely named when using a debugger...

But again, this isn't anything that should hold up the patch.


> > Is there any reason to use MAXLISTEN here? We know how many sockets w're
> > listening on by this point, I think? No idea if the overhead matters anywhere,
> > but ...
> 
> Fixed.

I was thinking of determining the number once, in PostmasterMain(). But that's
perhaps better done as a separate change... WFM.


> > Hm. This is preexisting behaviour, but now it seems somewhat odd that we might
> > end up happily forking a backend for each socket without checking signals
> > inbetween.  Forking might take a while, and if a signal arrived since the
> > WaitEventSetWait() we'll not react to it.
> 
> Right, so if you have 64 server sockets and they all have clients
> waiting on their listen queue, we'll service one connection from each
> before getting back to checking for pmsignals or shutdown, and that's
> unchanged in this patch.  (FWIW I also noticed that problem while
> experimenting with the idea that you could handle multiple clients in
> one go on OSes that report the listen queue depth size along with
> WL_SOCKET_ACCEPT events, but didn't pursue it...)
> 
> I guess we could check every time through the nevents loop.  I may
> look into that in a later patch, but I prefer to keep the same policy
> in this patch.

Makes sense. This was mainly me trying to make sure we're not changing subtle
stuff accidentally (and trying to understand how things work currently, as a
prerequisite).


> > >  static void
> > >  PostmasterStateMachine(void)
> > > @@ -3796,6 +3819,9 @@ PostmasterStateMachine(void)
> > >
> > >       if (pmState == PM_WAIT_DEAD_END)
> > >       {
> > > +             /* Don't allow any new socket connection events. */
> > > +             ConfigurePostmasterWaitSet(false);
> >
> > Hm. Is anything actually using the wait set until we re-create it with (true)
> > below?
> 
> Yes.  While in PM_WAIT_DEAD_END state, waiting for children to exit,
> there may be clients trying to connect.

Oh, Right. I had misread the diff, thinking the
ConfigurePostmasterWaitSet(false) was in the same PM_NO_CHILDREN branch that
the ConfigurePostmasterWaitSet(true) was in.


> On master, we have a special pg_usleep(100000L) instead of select() just for
> that state so we can ignore incoming connections while waiting for the
> SIGCHLD reaper to advance our state, but in this new world that's not
> enough.  We need to wait for the latch to be set by
> handle_child_exit_signal().  So I used the regular WES to wait for the latch
> (that is, no more special case for that state), but I need to ignore socket
> events.  If I didn't, then an incoming connection sitting in the listen
> queue would cause the server loop to burn 100% CPU reporting a
> level-triggered WL_SOCKET_ACCEPT for sockets that we don't want to accept
> (yet).

Yea, this is clearly the better approach.


A few more code review comments:

DetermineSleepTime() still deals with struct timeval, which we maintain at
some effort. Just to then convert it away from struct timeval in the
WaitEventSetWait() call. That doesn't seem quite right, and is basically
introduced in this patch.


I think ServerLoop still has an outdated comment:

 *
 * NB: Needs to be called with signals blocked

which we aren't doing (nor need to be doing) anymore.


>          /*
> -         * New connection pending on any of our sockets? If so, fork a child
> -         * process to deal with it.
> +         * Latch set by signal handler, or new connection pending on any of
> +         * our sockets? If the latter, fork a child process to deal with it.
>           */
> -        if (selres > 0)
> +        for (int i = 0; i < nevents; i++)
>          {
> -            int            i;
> -
> -            for (i = 0; i < MAXLISTEN; i++)
> +            if (events[i].events & WL_LATCH_SET)
>              {
> -                if (ListenSocket[i] == PGINVALID_SOCKET)
> -                    break;
> -                if (FD_ISSET(ListenSocket[i], &rmask))
> +                ResetLatch(MyLatch);
> +
> +                /* Process work scheduled by signal handlers. */

Very minor: It feels a tad off to say that the work was scheduled by signal
handlers, it's either from other processes or by the OS. But ...


> +/*
> + * Child processes use SIGUSR1 to send 'pmsignals'.  pg_ctl uses SIGUSR1 to ask
> + * postmaster to check for logrotate and promote files.
> + */

s/send/notify us of/, since the concrete "pmsignal" is actually transported
outside of the "OS signal" level?


LGTM.


I think this is a significant improvement, thanks for working on it.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Joseph Koshakow
Date:
Subject: Re: Infinite Interval
Next
From: "Karl O. Pinc"
Date:
Subject: Re: drop postmaster symlink