Re: Hot standby fails if any backend crashes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hot standby fails if any backend crashes
Date
Msg-id 10746.1328244526@sss.pgh.pa.us
Whole thread Raw
In response to Hot standby fails if any backend crashes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hot standby fails if any backend crashes  (Fujii Masao <masao.fujii@gmail.com>)
Re: Hot standby fails if any backend crashes  (Daniel Farina <daniel@heroku.com>)
Re: Hot standby fails if any backend crashes  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
I wrote:
> I'm currently working with Duncan Rance's test case for bug #6425, and
> I am observing a very nasty behavior in HEAD: once one of the
> hot-standby query backends crashes, the standby postmaster SIGQUIT's
> all its children and then just quits itself, with no log message and
> apparently no effort to restart.  Surely this is not intended?

I looked through postmaster.c and found that the cause of this is pretty
obvious: if the startup process exits with any non-zero status, we
assume that represents an unrecoverable error condition, and set
RecoveryError which causes the postmaster to exit silently as soon as
its last child is gone.  But we do this even if the reason the startup
process did exit(1) is that we sent it SIGQUIT as a result of a crash of
some other process.  Of course this logic dates from a time where the
startup process could not have any siblings, so when it was written,
such a thing was impossible.

I think saner behavior might only require this change:
           /*            * Any unexpected exit (including FATAL exit) of the startup            * process is treated as
acrash, except that we don't want to            * reinitialize.            */           if (!EXIT_STATUS_0(exitstatus))
         {
 
-               RecoveryError = true;
+               if (!FatalError)
+                   RecoveryError = true;               HandleChildCrash(pid, exitstatus,
_("startup process"));               continue;           }
 

plus suitable comment adjustments of course.  Haven't tested this yet
though.

It's a bit disturbing that nobody has reported this from the field yet.
Seems to imply that hot standby isn't being used much.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Patch pg_is_in_backup()
Next
From: Fujii Masao
Date:
Subject: Re: Hot standby fails if any backend crashes