Re: Hot standby, recovery infra - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Hot standby, recovery infra
Date
Msg-id 49897D77.503@enterprisedb.com
Whole thread Raw
In response to Re: Hot standby, recovery infra  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Hot standby, recovery infra  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Fujii Masao wrote:
> On Fri, Jan 30, 2009 at 11:55 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> The startup process now catches SIGTERM, and calls proc_exit() at the next
>> WAL record. That's what will happen in a fast shutdown. Unexpected death of
>> the startup process is treated the same as a backend/auxiliary process
>> crash.
> 
> If unexpected death of the startup process happens in automatic recovery
> after a crash, postmaster and bgwriter may get stuck. Because HandleChildCrash()
> can be called before FatalError flag is reset. When FatalError is false,
> HandleChildCrash() doesn't kill any auxiliary processes. So, bgwriter survives
> the crash and postmaster waits for the death of bgwriter forever with recovery
> status (which means that new connection cannot be started). Is this bug?

Yes, and in fact I ran into it myself yesterday while testing. It seems 
that we should reset FatalError earlier, ie. when the recovery starts 
and bgwriter is launched. I'm not sure why we in CVS HEAD we don't reset 
FatalError until after the startup process is finished. Resetting it as 
soon all the processes have been terminated and startup process is 
launched again would seem like a more obvious place to do it. The only 
difference that I can see is that if someone tries to connect while the 
startup process is running, you now get a "the database system is in 
recovery mode" message instead of "the database system is starting up" 
if we're reinitializing after crash. We can keep that behavior, just 
need to add another flag to mean "reinitializing after crash" that isn't 
reset until the recovery is over.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: polyphase merge?
Next
From: Svenne Krap
Date:
Subject: Re: LIMIT NULL