Thread: Why has postmaster shutdown gotten so slow?

Why has postmaster shutdown gotten so slow?

From
Tom Lane
Date:
Shutdown of an idle postmaster used to take about two or three seconds
(mostly due to the sync/sleep(2)/sync in md_sync).  For the last couple
of days it's taking more like a dozen seconds.  I presume somebody broke
something, but I'm unsure whether to pin the blame on bgwriter or
Windows changes.  Anyone care to fess up?
        regards, tom lane


Re: Why has postmaster shutdown gotten so slow?

From
Claudio Natoli
Date:
 

> Shutdown of an idle postmaster used to take about two or three seconds
> (mostly due to the sync/sleep(2)/sync in md_sync).  For the last couple
> of days it's taking more like a dozen seconds.  I presume somebody broke
> something, but I'm unsure whether to pin the blame on bgwriter or
> Windows changes.  Anyone care to fess up?

AFAICS, Win32 changes for the past few days have been minimal, and pretty
much isolated to Win32. Happy to stand corrected, but I'd start by looking
elsewhere...

Cheers,
Claudio

--- 
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see 
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>


Re: Why has postmaster shutdown gotten so slow?

From
Jan Wieck
Date:
Tom Lane wrote:

> Shutdown of an idle postmaster used to take about two or three seconds
> (mostly due to the sync/sleep(2)/sync in md_sync).  For the last couple
> of days it's taking more like a dozen seconds.  I presume somebody broke
> something, but I'm unsure whether to pin the blame on bgwriter or
> Windows changes.  Anyone care to fess up?

I guess it could well be the bgwriter, which when having nothing to do 
at all is sleeping for 10 seconds. Not sure, will check.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Why has postmaster shutdown gotten so slow?

From
Jan Wieck
Date:
Jan Wieck wrote:

> Tom Lane wrote:
> 
>> Shutdown of an idle postmaster used to take about two or three seconds
>> (mostly due to the sync/sleep(2)/sync in md_sync).  For the last couple
>> of days it's taking more like a dozen seconds.  I presume somebody broke
>> something, but I'm unsure whether to pin the blame on bgwriter or
>> Windows changes.  Anyone care to fess up?
> 
> I guess it could well be the bgwriter, which when having nothing to do 
> at all is sleeping for 10 seconds. Not sure, will check.

I checked the background writer for this and I can not reproduce the 
behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP 
for 10 seconds, which on Unix is done by select() and that is correctly 
interrupted when the postmaster sends it the term signal on shutdown.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Why has postmaster shutdown gotten so slow?

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> I checked the background writer for this and I can not reproduce the 
> behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP 
> for 10 seconds, which on Unix is done by select() and that is correctly 
> interrupted when the postmaster sends it the term signal on shutdown.

This appears to be a platform-dependent behavior.  The HPUX select(2) man
page says
         [EINTR]        The select() function was interrupted before any                        of the selected events
occurredand before the                        timeout interval expired. If SA_RESTART has been
setfor the interrupting signal, it is                        implementation-dependent whether select() restarts
              or returns with EINTR.
 

which text also appears verbatim in the Single Unix Spec.  Since we set
SA_RESTART for every signal except SIGALRM (see pqsignal.c), we are
subject to the implementation dependency for SIGTERM.

Tracing the bgwriter process on my machine makes it real obvious that in
fact the select delay is allowed to finish out when SIGTERM is received.
In fact worse than that: it's restarted from the beginning.  If 5
seconds have already elapsed, another 10 still elapse before the select
exits.

This won't do :-(.  We cannot afford to fritter away 10 seconds in the
SIGTERM shutdown cycle --- on typical systems init isn't going to give
us more than 20 seconds before a hard kill.

I'd suggest reducing the delay to a second or two, or perhaps breaking
it into several 1-second waits with interrupt flag checks between.

In the longer run we might want to rethink what we are doing with
SA_RESTART, but I am not sure about the implications of fooling with
that.
        regards, tom lane


Re: Why has postmaster shutdown gotten so slow?

From
Jan Wieck
Date:
Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> I checked the background writer for this and I can not reproduce the 
>> behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP 
>> for 10 seconds, which on Unix is done by select() and that is correctly 
>> interrupted when the postmaster sends it the term signal on shutdown.
> 
> This appears to be a platform-dependent behavior.  The HPUX select(2) man
> page says
> 
>           [EINTR]        The select() function was interrupted before any
>                          of the selected events occurred and before the
>                          timeout interval expired. If SA_RESTART has been
>                          set for the interrupting signal, it is
>                          implementation-dependent whether select() restarts
>                          or returns with EINTR.
> 
> which text also appears verbatim in the Single Unix Spec.  Since we set
> SA_RESTART for every signal except SIGALRM (see pqsignal.c), we are
> subject to the implementation dependency for SIGTERM.

That explains it.

> 
> Tracing the bgwriter process on my machine makes it real obvious that in
> fact the select delay is allowed to finish out when SIGTERM is received.
> In fact worse than that: it's restarted from the beginning.  If 5
> seconds have already elapsed, another 10 still elapse before the select
> exits.
> 
> This won't do :-(.  We cannot afford to fritter away 10 seconds in the
> SIGTERM shutdown cycle --- on typical systems init isn't going to give
> us more than 20 seconds before a hard kill.
> 
> I'd suggest reducing the delay to a second or two, or perhaps breaking
> it into several 1-second waits with interrupt flag checks between.
> 
> In the longer run we might want to rethink what we are doing with
> SA_RESTART, but I am not sure about the implications of fooling with
> that.

I think we should at this point have some maximum value for PG_xSLEEP 
over which it falls back to a function call that does either this 
breaking up into a loop with checking InterruptPending or removes the 
SA_RESTART flag while wating for the timeout.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Why has postmaster shutdown gotten so slow?

From
"Magnus Hagander"
Date:
>> Tracing the bgwriter process on my machine makes it real
>obvious that in
>> fact the select delay is allowed to finish out when SIGTERM
>is received.
>> In fact worse than that: it's restarted from the beginning.  If 5
>> seconds have already elapsed, another 10 still elapse before
>the select
>> exits.
>>
>> This won't do :-(.  We cannot afford to fritter away 10
>seconds in the
>> SIGTERM shutdown cycle --- on typical systems init isn't
>going to give
>> us more than 20 seconds before a hard kill.
>>
>> I'd suggest reducing the delay to a second or two, or
>perhaps breaking
>> it into several 1-second waits with interrupt flag checks between.
>>
>> In the longer run we might want to rethink what we are doing with
>> SA_RESTART, but I am not sure about the implications of fooling with
>> that.
>
>I think we should at this point have some maximum value for PG_xSLEEP
>over which it falls back to a function call that does either this
>breaking up into a loop with checking InterruptPending or removes the
>SA_RESTART flag while wating for the timeout.

If you look at my win32 signals patch nr 3 (posted feb 4th), I have code
to do this for win32 in it. It breaks up select() timeouts into pieces
of 1 second and polls for win32 signals inbetween.

Turns out it wasn't necessary, since win32 *does* deliver our signals
whlie in select. So for once it's win32 that does what we want - I think
that's a first.. But it might help on another platform.


//Magnus