Re: Unintended restart after recovery error - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Unintended restart after recovery error
Date
Msg-id CA+TgmoaV_KT=oTrdZ+xsm4AM69A_vMmRQRRmDWSiotd5v865iw@mail.gmail.com
Whole thread Raw
In response to Re: Unintended restart after recovery error  (Antonin Houska <ah@cybertec.at>)
Responses Re: Unintended restart after recovery error  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Wed, Nov 12, 2014 at 4:52 PM, Antonin Houska <ah@cybertec.at> wrote:
> Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska <ah@cybertec.at> wrote:
>> > While looking at postmaster.c:reaper(), one problematic case occurred to me.
>> >
>> >
>> > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED.
>> >
>> > 2. Checkpointer process is forked and immediately dies.
>> >
>> > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets
>> > FatalError to true.
>> >
>> > 4. Startup process exits with non-zero status code too - either due to SIGQUIT
>> > received from HandleChildCrash or due to some other failure of the startup
>> > process itself. However, FatalError is already set, because of the previous
>> > crash of the checkpointer. Thus reaper() does not set RecoveryError.
>> >
>> > 5. As RecoverError failed to be set to true, postmaster will try to restart
>> > the cluster, although it apparently should not.
>>
>> Why shouldn't postmaster restart the cluster in that case?
>>
>
> At least for the behavior to be consistent with simpler cases of failed
> recovery (e.g. any FATAL error in StartupXLOG), which end up not restarting
> the cluster.

It's true that if the startup process dies we don't try to restart,
but it's also true that if the checkpointer dies we do try to restart.
I'm not sure why this specific situation should be an exception to
that general rule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: On partitioning
Next
From: Jim Nasby
Date:
Subject: Re: On partitioning