Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

From MauMau
Subject Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date
Msg-id B745B64BB2E54268BC2EAA7F23154376@maumau
Whole thread Raw
In response to Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
From: "Alvaro Herrera" <alvherre@2ndquadrant.com>
> Yeah, I see that --- after removing that early exit, there are unwanted
> messages.  And in fact there are some signals sent that weren't
> previously sent.  Clearly we need something here: if we're in immediate
> shutdown handler, don't signal anyone (because they have already been
> signalled) and don't log any more messages; but the cleaning up of
> postmaster's process list must still be carried out.
>
> Would you please add that on top of the attached cleaned up version of
> your patch?

Thanks.  I'll do that tomorrow at the earliest.

> Noah Misch escribió:
>> On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote:
>
>> > the clients at the immediate shutdown.  The code gets much simpler.  In
>> > addition, it may be better that we similarly send SIGKILL in backend
>> > crash (FatalError) case, eliminate the use of SIGQUIT and remove
>> > quickdie() and other SIGQUIT handlers.
>>
>> My take is that the client message has some value, and we shouldn't just
>> discard it to simplify the code slightly.  Finishing the shutdown quickly 
>> has
>> value, of course.  The relative importance of those things should guide 
>> the
>> choice of a timeout under method #2, but I don't see a rigorous way to 
>> draw
>> the line.  I feel five seconds is, if anything, too high.
>
> There's obviously a lot of disagreement on 5 seconds being too high or
> too low.  We have just followed SysV init's precedent of waiting 5
> seconds by default between sending signals TERM and QUIT during a
> shutdown.  I will note that during a normal shutdown, services are
> entitled to do much more work than just signal all their children to
> exit immediately; and yet I don't find much evidence that this period is
> inordinately short.  I don't feel strongly that it couldn't be shorter,
> though.
>
>> What about using deadlock_timeout?  It constitutes a rough barometer on 
>> the
>> system's tolerance for failure detection latency, and we already overload 
>> it
>> by having it guide log_lock_waits.  The default of 1s makes sense to me 
>> for
>> this purpose, too.  We can always add a separate 
>> immediate_shutdown_timeout if
>> there's demand, but I don't expect such demand.  (If we did add such a 
>> GUC,
>> setting it to zero could be the way to request method 1.  If some folks
>> strongly desire method 1, that's probably the way to offer it.)
>
> I dunno.  Having this be configurable seems overkill to me.  But perhaps
> that's the way to satisfy most people: some people could set it very
> high so that they could have postmaster wait longer if they believe
> their server is going to be overloaded; people wishing immediate SIGKILL
> could get that too, as you describe.
>
> I think this should be a separate patch, however.

I think so, too.  We can add a parameter later if we find it highly 
necessary after some experience in the field.


Regards
MauMau





pgsql-hackers by date:

Previous
From: "Yuri Levinsky"
Date:
Subject: Hash partitioning.
Next
From: Andres Freund
Date:
Subject: Re: Possible bug in CASE evaluation