Home > mailing lists

Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

From	MauMau
Subject	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date	June 20, 2013 11:45:41
Msg-id	7890B04F09F345EDB20902BBD7B5AEE6@maumau Whole thread
In response to	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
List	pgsql-hackers

Tree view

First, thank you for the review.

From: "Alvaro Herrera" <alvherre@2ndquadrant.com>
> This seems reasonable.  Why 10 seconds?  We could wait 5 seconds, or 15.
> Is there a rationale behind the 10?  If we said 60, that would fit
> perfectly well within the already existing 60-second loop in postmaster,
> but that seems way too long.

There is no good rationale.  I arbitrarily chose a short period because this 
is "immediate" shutdown.  I felt more than 10 second was long.  I think 5 
second may be better.  Although not directly related to this fix, these 
influenced my choice:

1. According to the man page of init, init sends SIGKILL to all remaining 
processes 5 seconds after it sends SIGTERM to them.

2. At computer shutdown, Windows proceeds shutdown forcibly after waiting 
for services to terminate 20 seconds.


> I have only one concern about this patch, which is visible in the
> documentation proposed change:
>
>       <para>
>       This is the <firstterm>Immediate Shutdown</firstterm> mode.
>       The master <command>postgres</command> process will send a
> -      <systemitem>SIGQUIT</systemitem> to all child processes and exit
> -      immediately, without properly shutting itself down. The child 
> processes
> -      likewise exit immediately upon receiving
> -      <systemitem>SIGQUIT</systemitem>. This will lead to recovery (by
> +      <systemitem>SIGQUIT</systemitem> to all child processes, wait for
> +      them to terminate, and exit. The child processes
> +      exit immediately upon receiving
> +      <systemitem>SIGQUIT</systemitem>. If any of the child processes
> +      does not terminate within 10 seconds for some unexpected reason,
> +      the master postgres process will send a 
> <systemitem>SIGKILL</systemitem>
> +      to all remaining ones, wait for their termination
> +      again, and exit. This will lead to recovery (by
>       replaying the WAL log) upon next start-up. This is recommended
>       only in emergencies.
>       </para>
>
> Note that the previous text said that postmaster will send SIGQUIT, then
> terminate without checking anything.  In the new code, postmaster sends
> SIGQUIT, then waits, then SIGKILL, then waits again.  If there's an
> unkillable process (say because it's stuck in a noninterruptible sleep)
> postmaster might never exit.  I think it should send SIGQUIT, then wait,
> then SIGKILL, then exit without checking.

At first I thought the same, but decided not to do that.  The purpose of 
this patch is to make the immediate shutdown "reliable".  Here, "reliable" 
means that the database server is certainly shut down when pg_ctl returns, 
not telling a lie that "I shut down the server processes for you, so you do 
not have to be worried that some postgres process might still remain and 
write to disk".  I suppose reliable shutdown is crucial especially in HA 
cluster.  If pg_ctl stop -mi gets stuck forever when there is an unkillable 
process (in what situations does this happen? OS bug, or NFS hard mount?), I 
think the DBA has to notice this situation from the unfinished pg_ctl, 
investigate the cause, and take corrective action.  Anyway, in HA cluster, 
the clusterware will terminate the node with STONITH, not leaving pg_ctl 
running forever.


> I have tweaked the patch a bit and I'm about to commit as soon as we
> resolve the above two items.

I appreciate your tweaking, especially in the documentation and source code 
comments, as English is not my mother tongue.

Regards
MauMau

pgsql-hackers by date:

From: Thom Brown
Date: 20 June 2013, 11:37:00
Subject: Re: Config reload/restart preview

From: Fabrízio de Royes Mello
Date: 20 June 2013, 12:48:34
Subject: Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

Previous

Next