Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) |
Date | |
Msg-id | 20130624192945.GE4051@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) ("MauMau" <maumau307@gmail.com>) |
Responses |
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) |
List | pgsql-hackers |
MauMau escribió: > From: "Alvaro Herrera" <alvherre@2ndquadrant.com> > >Actually, in further testing I noticed that the fast-path you introduced > >in BackendCleanup (or was it HandleChildCrash?) in the immediate > >shutdown case caused postmaster to fail to clean up properly after > >sending the SIGKILL signal, so I had to remove that as well. Was there > >any reason for that? > > You are talking about the code below at the beginning of > HandleChildCrash(), aren't you? > > /* Do nothing if the child terminated due to immediate shutdown */ > if (Shutdown == ImmediateShutdown) > return; > > If my memory is correct, this was necessary; without this, > HandleChildCrash() or LogChildExit() left extra messages in the > server log. > > LOG: %s (PID %d) exited with exit code %d > LOG: terminating any other active server processes > > These messages are output because postmaster sends SIGQUIT to its > children and wait for them to terminate. The children exit with > non-zero status when they receive SIGQUIT. Yeah, I see that --- after removing that early exit, there are unwanted messages. And in fact there are some signals sent that weren't previously sent. Clearly we need something here: if we're in immediate shutdown handler, don't signal anyone (because they have already been signalled) and don't log any more messages; but the cleaning up of postmaster's process list must still be carried out. Would you please add that on top of the attached cleaned up version of your patch? Noah Misch escribió: > On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote: > > the clients at the immediate shutdown. The code gets much simpler. In > > addition, it may be better that we similarly send SIGKILL in backend > > crash (FatalError) case, eliminate the use of SIGQUIT and remove > > quickdie() and other SIGQUIT handlers. > > My take is that the client message has some value, and we shouldn't just > discard it to simplify the code slightly. Finishing the shutdown quickly has > value, of course. The relative importance of those things should guide the > choice of a timeout under method #2, but I don't see a rigorous way to draw > the line. I feel five seconds is, if anything, too high. There's obviously a lot of disagreement on 5 seconds being too high or too low. We have just followed SysV init's precedent of waiting 5 seconds by default between sending signals TERM and QUIT during a shutdown. I will note that during a normal shutdown, services are entitled to do much more work than just signal all their children to exit immediately; and yet I don't find much evidence that this period is inordinately short. I don't feel strongly that it couldn't be shorter, though. > What about using deadlock_timeout? It constitutes a rough barometer on the > system's tolerance for failure detection latency, and we already overload it > by having it guide log_lock_waits. The default of 1s makes sense to me for > this purpose, too. We can always add a separate immediate_shutdown_timeout if > there's demand, but I don't expect such demand. (If we did add such a GUC, > setting it to zero could be the way to request method 1. If some folks > strongly desire method 1, that's probably the way to offer it.) I dunno. Having this be configurable seems overkill to me. But perhaps that's the way to satisfy most people: some people could set it very high so that they could have postmaster wait longer if they believe their server is going to be overloaded; people wishing immediate SIGKILL could get that too, as you describe. I think this should be a separate patch, however. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: