Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
Date
Msg-id 3A626702.7DD48F11@tpf.co.jp
Whole thread Raw
In response to RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >>>> I've thought that the main purpose of CRIT_SECTION is to
> >>>> force redo recovery for any errors during the CRIT_SECTION
> >>>> to complete the critical operation e.g. bt_split().
> >>
> >> How could it force redo?
> 
> > Doesn't proc_exit(non-zero) force shuttdown recovery ?
> 
> It forces a shutdown and restart, but that does not do anything good
> that I can see.  The WAL log entry hasn't been made, typically, so there
> is nothing to redo.  If there *were* a log entry, and the redo failed
> again (pretty likely), then we'd have an infinite crash/try to
> restart/crash cycle, which is just about the worst possible behavior.
> So I'm not seeing what the point is.
> 

It seems a nature of 7.1 recovery scheme.
Once a WAL log entry is made, recovery should 
complete the log in regardless of the cause of
recovery(elog, system error like SEGV etc).

I've wondered why no one has asked how we could
recover from a recovery failure. Unfortunately,
I don't know the answer. Recovery failure seems
veeeeery serious because postmaster couldn't
start if the startup recovery fails.
In addtion I have another anxiety. I don't know
how robust WAL is against general bugs not
directly related to WAL.

Regards.
Hiroshi Inoue


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: copy from stdin; bug?
Next
From: Rehak Tamas
Date:
Subject: Re: copy from stdin; bug?