Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date
Msg-id 20130624192945.GE4051@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  ("MauMau" <maumau307@gmail.com>)
Responses Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
List pgsql-hackers
MauMau escribió:
> From: "Alvaro Herrera" <alvherre@2ndquadrant.com>

> >Actually, in further testing I noticed that the fast-path you introduced
> >in BackendCleanup (or was it HandleChildCrash?) in the immediate
> >shutdown case caused postmaster to fail to clean up properly after
> >sending the SIGKILL signal, so I had to remove that as well.  Was there
> >any reason for that?
>
> You are talking about the code below at the beginning of
> HandleChildCrash(), aren't you?
>
> /* Do nothing if the child terminated due to immediate shutdown */
> if (Shutdown == ImmediateShutdown)
>  return;
>
> If my memory is correct, this was necessary; without this,
> HandleChildCrash() or LogChildExit() left extra messages in the
> server log.
>
> LOG:  %s (PID %d) exited with exit code %d
> LOG:  terminating any other active server processes
>
> These messages are output because postmaster sends SIGQUIT to its
> children and wait for them to terminate.  The children exit with
> non-zero status when they receive SIGQUIT.

Yeah, I see that --- after removing that early exit, there are unwanted
messages.  And in fact there are some signals sent that weren't
previously sent.  Clearly we need something here: if we're in immediate
shutdown handler, don't signal anyone (because they have already been
signalled) and don't log any more messages; but the cleaning up of
postmaster's process list must still be carried out.

Would you please add that on top of the attached cleaned up version of
your patch?


Noah Misch escribió:
> On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote:

> > the clients at the immediate shutdown.  The code gets much simpler.  In
> > addition, it may be better that we similarly send SIGKILL in backend
> > crash (FatalError) case, eliminate the use of SIGQUIT and remove
> > quickdie() and other SIGQUIT handlers.
>
> My take is that the client message has some value, and we shouldn't just
> discard it to simplify the code slightly.  Finishing the shutdown quickly has
> value, of course.  The relative importance of those things should guide the
> choice of a timeout under method #2, but I don't see a rigorous way to draw
> the line.  I feel five seconds is, if anything, too high.

There's obviously a lot of disagreement on 5 seconds being too high or
too low.  We have just followed SysV init's precedent of waiting 5
seconds by default between sending signals TERM and QUIT during a
shutdown.  I will note that during a normal shutdown, services are
entitled to do much more work than just signal all their children to
exit immediately; and yet I don't find much evidence that this period is
inordinately short.  I don't feel strongly that it couldn't be shorter,
though.

> What about using deadlock_timeout?  It constitutes a rough barometer on the
> system's tolerance for failure detection latency, and we already overload it
> by having it guide log_lock_waits.  The default of 1s makes sense to me for
> this purpose, too.  We can always add a separate immediate_shutdown_timeout if
> there's demand, but I don't expect such demand.  (If we did add such a GUC,
> setting it to zero could be the way to request method 1.  If some folks
> strongly desire method 1, that's probably the way to offer it.)

I dunno.  Having this be configurable seems overkill to me.  But perhaps
that's the way to satisfy most people: some people could set it very
high so that they could have postmaster wait longer if they believe
their server is going to be overloaded; people wishing immediate SIGKILL
could get that too, as you describe.

I think this should be a separate patch, however.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: PLR does not install language templates
Next
From: Tom Lane
Date:
Subject: Re: dump difference between 9.3 and master after upgrade