Home > mailing lists

Re: Hot standby, recovery infra - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Hot standby, recovery infra
Date	February 26, 2009 14:38:43
Msg-id	49A6E1A9.5020901@enterprisedb.com Whole thread Raw
In response to	Re: Hot standby, recovery infra (Fujii Masao <masao.fujii@gmail.com>)
Responses	Re: Hot standby, recovery infra Re: Hot standby, recovery infra
List	pgsql-hackers

Tree view

Fujii Masao wrote:
> On Fri, Jan 30, 2009 at 7:47 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> That whole area was something I was leaving until last, since immediate
>> shutdown doesn't work either, even in HEAD. (Fujii-san and I discussed
>> this before Christmas, briefly).
> 
> This problem remains in current HEAD. I mean, immediate shutdown
> may be unable to kill the startup process because system() which
> executes restore_command ignores SIGQUIT while waiting.
> When I tried immediate shutdown during recovery, only the startup
> process survived. This is undesirable behavior, I think.

Yeah, we need to fix that.

> The following code should be added into RestoreArchivedFile()?
> 
> ----
> if (WTERMSIG(rc) == SIGQUIT)
>        exit(2);
> ----

I don't see how that helps, as we already have this in there:
signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
ereport(signaled ? FATAL : DEBUG2,    (errmsg("could not restore file \"%s\" from archive: return code %d",
xlogfname,rc)));

which means we already ereport(FATAL) if the restore command dies with 
SIGQUIT.

I think the real problem here is that pg_standby traps SIGQUIT. The 
startup process doesn't receive the SIGQUIT because it's in system(), 
and pg_standby doesn't propagate it to the startup process either 
because it traps it.

I think we should simply remove the signal handler for SIGQUIT from 
pg_standby. Or will that lead to core dump by default? In that case, we 
need pg_standby to exit(128) or similar, so that RestoreArchivedFile 
understands that the command was killed by a signal.

Another approach is to check that the postmaster is still alive, like we  do in walwriter and bgwriter:
    /*     * Emergency bailout if postmaster has died.  This is to avoid the     * necessity for manual cleanup of all
postmasterchildren.     */    if (!PostmasterIsAlive(true))        exit(1);

However, I'm afraid there's a race condition with that. If we do that 
right after system(), postmaster might've signaled us but not exited 
yet. We could check that in the main loop, but if we wrongly interpret 
the exit of the recovery command as a "file not found - go ahead and 
start up", the damage might be done by the time we notice that the 
postmaster is gone.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Andrew Dunstan
Date: 26 February 2009, 14:34:47
Subject: Re: xpath processing brain dead

From: Robert Haas
Date: 26 February 2009, 14:51:25
Subject: Re: xpath processing brain dead

Re: Hot standby, recovery infra - Mailing list pgsql-hackers

Previous

Next