Home > mailing lists

Re: Unintended restart after recovery error - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Unintended restart after recovery error
Date	November 17, 2014 18:47:04
Msg-id	CA+TgmoYi7DwEP+EhaMW-sYfNLu2B0Bh-yz1PeWkNV2s7_0w8bA@mail.gmail.com Whole thread Raw
In response to	Re: Unintended restart after recovery error (Fujii Masao <masao.fujii@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 13, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
> for hot-standby case. Maybe *during crash recovery* (i.e., hot standby
> should not be enabled) it's better to treat the crash of startup process as
> a catastrophic crash.

Maybe, but why, specifically?  If the startup process failed
internally, it's probably because it hit an error during the replay of
some WAL record.  So if we restart it, it will back up to the previous
checkpoint or restartpoint, replay the same WAL records as before, and
die again in the same spot.  We don't want it to sit there and do that
forever in an infinite loop, so it makes sense to kill the whole
server.

But if the startup process was killed off because the checkpointer
croaked, that logic doesn't necessarily apply.  There's no reason to
assume that the replay of a particular WAL record was what killed the
checkpointer; in fact, it seems like the odds are against it.  So it
seems right to fall back to our general principle of restarting the
server and hoping that's enough to get things back on line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 17 November 2014, 18:40:56
Subject: BRIN page type identifier

From: Robert Haas
Date: 17 November 2014, 19:01:16
Subject: Re: using custom scan nodes to prototype parallel sequential scan

Re: Unintended restart after recovery error - Mailing list pgsql-hackers

Previous

Next