Home > mailing lists

Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date	June 20, 2013 05:05:38
Msg-id	20130620020528.GW3537@eldon.alvh.no-ip.org Whole thread Raw
In response to	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) ("MauMau" <maumau307@gmail.com>)
Responses	Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) ("MauMau" <maumau307@gmail.com>)
List	pgsql-hackers

Tree view

MauMau escribió:

> Could you review the patch?  The summary of the change is:
> 1. postmaster waits for children to terminate when it gets an
> immediate shutdown request, instead of exiting.
> 
> 2. postmaster sends SIGKILL to remaining children if all of the
> child processes do not terminate within 10 seconds since the start
> of immediate shutdown or FatalError condition.

This seems reasonable.  Why 10 seconds?  We could wait 5 seconds, or 15.
Is there a rationale behind the 10?  If we said 60, that would fit
perfectly well within the already existing 60-second loop in postmaster,
but that seems way too long.

I have only one concern about this patch, which is visible in the
documentation proposed change:
      <para>      This is the <firstterm>Immediate Shutdown</firstterm> mode.      The master
<command>postgres</command>process will send a
 
-      <systemitem>SIGQUIT</systemitem> to all child processes and exit
-      immediately, without properly shutting itself down. The child processes
-      likewise exit immediately upon receiving
-      <systemitem>SIGQUIT</systemitem>. This will lead to recovery (by
+      <systemitem>SIGQUIT</systemitem> to all child processes, wait for
+      them to terminate, and exit. The child processes
+      exit immediately upon receiving
+      <systemitem>SIGQUIT</systemitem>. If any of the child processes
+      does not terminate within 10 seconds for some unexpected reason,
+      the master postgres process will send a <systemitem>SIGKILL</systemitem>
+      to all remaining ones, wait for their termination
+      again, and exit. This will lead to recovery (by      replaying the WAL log) upon next start-up. This is
recommended     only in emergencies.      </para>
 

Note that the previous text said that postmaster will send SIGQUIT, then
terminate without checking anything.  In the new code, postmaster sends
SIGQUIT, then waits, then SIGKILL, then waits again.  If there's an
unkillable process (say because it's stuck in a noninterruptible sleep)
postmaster might never exit.  I think it should send SIGQUIT, then wait,
then SIGKILL, then exit without checking.

I have tweaked the patch a bit and I'm about to commit as soon as we
resolve the above two items.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Peter Eisentraut
Date: 20 June 2013, 05:01:26
Subject: Re: Git-master regression failure

From: Andrew Dunstan
Date: 20 June 2013, 05:36:45
Subject: Re: slightly confusing JSON error context

Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers

Previous

Next