Thread: Immediate shutdown during recovery

Immediate shutdown during recovery

From
"Fujii Masao"
Date:
Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In order to prevent the surviving, I think that the startup process
should check whether postmaster is still alive periodically. This
idea is already adopted in the archiver process which also calls
system() for executing archive_command.

What is your opinion?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Immediate shutdown during recovery

From
"Fujii Masao"
Date:
Hi,

On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> The immediate shutdown (pg_ctl -m i stop) might not be able to
> kill the startup process during archive recovery. It's because
> the startup process calls system() which ignores SIGQUIT for
> executing the restore_command. So, only the startup process
> might survive the immediate shutdown and continue redoing up
> to the end. Is this desirable behavior? This sounds odd for me.

In RestoreArchivedFile(), there is the following code as the safeguard
against the termination of restore_command by signal. But the
safeguard might not work if restore_command defines its own signal
handler for SIGQUIT like pg_standby.

> signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
>
> ereport(signaled ? FATAL : DEBUG2,
>     (errmsg("could not restore file \"%s\" from archive: return code %d",
>             xlogfname, rc)));

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Immediate shutdown during recovery

From
Simon Riggs
Date:
On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:
> Hi,
> 
> On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > Hi,
> >
> > The immediate shutdown (pg_ctl -m i stop) might not be able to
> > kill the startup process during archive recovery. It's because
> > the startup process calls system() which ignores SIGQUIT for
> > executing the restore_command. So, only the startup process
> > might survive the immediate shutdown and continue redoing up
> > to the end. Is this desirable behavior? This sounds odd for me.
> 
> In RestoreArchivedFile(), there is the following code as the safeguard
> against the termination of restore_command by signal. But the
> safeguard might not work if restore_command defines its own signal
> handler for SIGQUIT like pg_standby.
> 
> > signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
> >
> > ereport(signaled ? FATAL : DEBUG2,
> >     (errmsg("could not restore file \"%s\" from archive: return code %d",
> >             xlogfname, rc)));

Agree there is an existing problem.

Suggest we fix it after the main patches are committed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Immediate shutdown during recovery

From
"Fujii Masao"
Date:
Hello,

On Sat, Nov 29, 2008 at 12:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:
>> Hi,
>>
>> On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > Hi,
>> >
>> > The immediate shutdown (pg_ctl -m i stop) might not be able to
>> > kill the startup process during archive recovery. It's because
>> > the startup process calls system() which ignores SIGQUIT for
>> > executing the restore_command. So, only the startup process
>> > might survive the immediate shutdown and continue redoing up
>> > to the end. Is this desirable behavior? This sounds odd for me.
>>
>> In RestoreArchivedFile(), there is the following code as the safeguard
>> against the termination of restore_command by signal. But the
>> safeguard might not work if restore_command defines its own signal
>> handler for SIGQUIT like pg_standby.
>>
>> > signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
>> >
>> > ereport(signaled ? FATAL : DEBUG2,
>> >     (errmsg("could not restore file \"%s\" from archive: return code %d",
>> >             xlogfname, rc)));
>
> Agree there is an existing problem.
>
> Suggest we fix it after the main patches are committed.

OK, thanks.

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center