Re: Autovacuum launcher doesn't notice death of postmaster immediately - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date
Msg-id 20070604150426.GI4779@alvh.no-ip.org
Whole thread Raw
In response to Autovacuum launcher doesn't notice death of postmaster immediately  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Autovacuum launcher doesn't notice death of postmaster immediately  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-hackers
Peter Eisentraut wrote:
> I notice that in 8.3, when I kill the postmaster process with SIGKILL or 
> SIGSEGV, the child processes writer and stats collector go away 
> immediately, but the autovacuum launcher hangs around for up to a 
> minute.  (I suppose this has to do with the periodic wakeups?).  When 
> you try to restart the postmaster before that it fails with a complaint 
> that someone is still attached to the shared memory segment.
> 
> These are obviously not normal modes of operation, but I fear that this 
> could cause some problems with people's control scripts of the 
> sort, "it crashed, let's try to restart it".

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death
for a very long time, which is probably bad.  (You measured a one minute
delay because that's the default naptime).

Maybe this is not such a hot idea, and we should wake the launcher up
every 10 seconds (or less?).  I picked 10 seconds because that's the
time the bgwriter sleeps if there is no activity configured.  Does this
sound acceptable?  The only problem with waking it up too frequently is
that it would be waking the system up (for gettimeofday()) even if
nothing is happening.

I also just noticed that the launcher will check if postmaster is alive,
then sleep, and then possibly do some work.  So if the postmaster died
in the sleep period, the launcher might try to do some work.  Should we
add a check for postmaster liveliness after the sleep?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: So, why isn't *every* buildfarm member failing ecpg right now?
Next
From: Andrew Dunstan
Date:
Subject: Re: Running all tests by default