Re: Unintended restart after recovery error - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Unintended restart after recovery error
Date
Msg-id CA+TgmoYi7DwEP+EhaMW-sYfNLu2B0Bh-yz1PeWkNV2s7_0w8bA@mail.gmail.com
Whole thread Raw
In response to Re: Unintended restart after recovery error  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Thu, Nov 13, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
> for hot-standby case. Maybe *during crash recovery* (i.e., hot standby
> should not be enabled) it's better to treat the crash of startup process as
> a catastrophic crash.

Maybe, but why, specifically?  If the startup process failed
internally, it's probably because it hit an error during the replay of
some WAL record.  So if we restart it, it will back up to the previous
checkpoint or restartpoint, replay the same WAL records as before, and
die again in the same spot.  We don't want it to sit there and do that
forever in an infinite loop, so it makes sense to kill the whole
server.

But if the startup process was killed off because the checkpointer
croaked, that logic doesn't necessarily apply.  There's no reason to
assume that the replay of a particular WAL record was what killed the
checkpointer; in fact, it seems like the odds are against it.  So it
seems right to fall back to our general principle of restarting the
server and hoping that's enough to get things back on line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: BRIN page type identifier
Next
From: Robert Haas
Date:
Subject: Re: using custom scan nodes to prototype parallel sequential scan