Thread: postmaster recovery and automatic restart suppression

postmaster recovery and automatic restart suppression

From
"Kolb, Harald (NSN - DE/Munich)"
Date:
<p><span lang="de"><font face="Arial" size="2">Hi,</font></span><p><span lang="de"><font face="Arial" size="2">in case
ofa serious failure of a backend or an auxiliary process the postmaster performs a crash recovery and restarts the db
automatically.</font></span><p><spanlang="de"><font face="Arial" size="2">Is there a possibility to deactivate the
restartand to force the postmaster to simply exit at the end ? </font></span><br /><span lang="de"><font face="Arial"
size="2">Thebackground is that we will have a watchdog process which will in this case perform a fast switchover to the
standbyside (in case of syncronous replication) or will restart the db by its own and in addition will perform some
specificactions.  </font></span><p><span lang="de"><font face="Arial" size="2">Regards, </font></span><p><span
lang="de"><fontface="Arial" size="2">Harald Kolb.</font></span><p><span lang="de"><font face="Arial"
size="2"> </font></span><spanlang="en-us"> </span><br /><span lang="de"><font color="#000000" face="Arial"
size="2">Bestregards / freundliche Grüße</font></span><br /><span lang="de"><font color="#000080" face="Arial"
size="2">-----------------------------------------</font></span><br/><span lang="de"><font color="#000000" face="Arial"
size="2">HaraldKolb</font></span><br /><span lang="en-us"><font color="#000000" face="Arial" size="2">COO RTP PD SW RD
AreaB 1 DE</font></span><br /><span lang="en-us"><font color="#000000" face="Arial" size="2">Mch-M Building 5532 / Room
3045</font></span><br/><span lang="en-us"><font color="#000000" face="Arial" size="2">Tel: +49 89 636
47606</font></span><p><spanlang="es"></span><a href="mailto:Harald.Kolb@nsn.com"><span lang="es"><u></u><u><font
color="#0000FF"face="Arial" size="2">mailto:Harald.Kolb@nsn.com</font></u></span><span lang="es"></span></a><span
lang="es"></span><spanlang="es"></span><span lang="es"></span><span lang="es"></span><span lang="es"></span><span
lang="de"><fontcolor="#808080" face="Arial" size="2"></font> </span><br /><span lang="de"></span><a
href="http://www.nokiasiemensnetworks.com/global/"><spanlang="de"><u></u><u><font color="#0000FF" face="Arial"
size="2">http://www.nokiasiemensnetworks.com/global/</font></u></span></a><span
lang="de"><u></u><u></u><u></u></span><p><spanlang="de"><b><font color="#C0C0C0" face="Times New Roman"
size="1">Nokia</font></b><fontcolor="#C0C0C0" face="Times New Roman" size="1"></font><b> <font color="#C0C0C0"
face="TimesNew Roman" size="1">Siemens Networks GmbH & Co. KG<br /></font><font color="#C0C0C0" face="Arial"
size="1">Sitzder Gesellschaft: München / Registered office: Munich<br /> Registergericht: München / Commercial
registry:Munich, HRA 88537 WEEE-Reg.-Nr.: DE 52984304</font></b></span><br /><span lang="de"><b><font color="#C0C0C0"
face="Arial"size="1">Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH<br
/>Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke</font></b></span><br /><span lang="de"><b><font
color="#C0C0C0"face="Arial" size="1">Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen<br
/>Sitz der Gesellschaft: München / Registered office: Munich<br /> Registergericht: München / Commercial registry:
Munich,HRB 163416</font><br /></b></span><span lang="en-us"></span><br /> 

Re: postmaster recovery and automatic restart suppression

From
Fujii Masao
Date:
Hi,

On Fri, Jun 5, 2009 at 1:02 AM, Kolb, Harald (NSN - DE/Munich)
<harald.kolb@nsn.com> wrote:
> Hi,
>
> in case of a serious failure of a backend or an auxiliary process the
> postmaster performs a crash recovery and restarts the db automatically.
>
> Is there a possibility to deactivate the restart and to force the postmaster
> to simply exit at the end ?

Good point. I also think that this makes a handling of failover
more complicated. In other words, clusterware cannot determine
whether to do failover when it detects the death of the primary
postgres. A wrong decision might cause split brain syndrome.

How about new GUC parameter to determine whether to restart
postmaster automatically when it fails abnormally? This would
be useful for various failover system.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: postmaster recovery and automatic restart suppression

From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi,

> -----Original Message-----
> From: ext Fujii Masao [mailto:masao.fujii@gmail.com]
> Sent: Friday, June 05, 2009 8:14 AM
> To: Kolb, Harald (NSN - DE/Munich)
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] postmaster recovery and automatic
> restart suppression
>
> Hi,
>
> On Fri, Jun 5, 2009 at 1:02 AM, Kolb, Harald (NSN - DE/Munich)
> <harald.kolb@nsn.com> wrote:
> > Hi,
> >
> > in case of a serious failure of a backend or an auxiliary
> process the
> > postmaster performs a crash recovery and restarts the db
> automatically.
> >
> > Is there a possibility to deactivate the restart and to
> force the postmaster
> > to simply exit at the end ?
>
> Good point. I also think that this makes a handling of failover
> more complicated. In other words, clusterware cannot determine
> whether to do failover when it detects the death of the primary
> postgres. A wrong decision might cause split brain syndrome.
Mh, I cannot follow your reflections. Could you explain a little bit
more ?
>
> How about new GUC parameter to determine whether to restart
> postmaster automatically when it fails abnormally? This would
> be useful for various failover system.

A new GUC parameter would be the optimal solution.
Since I'm new to the community, what's the "usual" way to make this
happen ?

Regards, Harald.


Re: postmaster recovery and automatic restart suppression

From
Fujii Masao
Date:
Hi,

On Fri, Jun 5, 2009 at 9:24 PM, Kolb, Harald (NSN -
DE/Munich)<harald.kolb@nsn.com> wrote:
>> Good point. I also think that this makes a handling of failover
>> more complicated. In other words, clusterware cannot determine
>> whether to do failover when it detects the death of the primary
>> postgres. A wrong decision might cause split brain syndrome.
> Mh, I cannot follow your reflections. Could you explain a little bit
> more ?
>>
>> How about new GUC parameter to determine whether to restart
>> postmaster automatically when it fails abnormally? This would
>> be useful for various failover system.

The primary postgres might restart automatically after clusterware finished
failover (i.e. the standby postgres has came up live). In this case, postgres
would work in each server, and they are independent of each other. This
is known as one of Split-Brain syndrome. The problem is that, for example,
if they share the archival storage, some archived files might get lost; the
original primary postgres might overwrite the archived file which is written
by the new primary.

On the other hand, the primary postgres might *not* restart automatically.
So, it's difficult for clusterware to choose whether to do failover when it
detects the deatch of the primary postgres, I think.

> A new GUC parameter would be the optimal solution.
> Since I'm new to the community, what's the "usual" way to make this
> happen ?

The followings might be a good reference to you.

http://www.pgcon.org/2009/schedule/events/178.en.html
http://wiki.postgresql.org/wiki/Submitting_a_Patch

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: postmaster recovery and automatic restart suppression

From
Gregory Stark
Date:
Fujii Masao <masao.fujii@gmail.com> writes:

> On the other hand, the primary postgres might *not* restart automatically.
> So, it's difficult for clusterware to choose whether to do failover when it
> detects the death of the primary postgres, I think.


I think the accepted way to handle this kind of situation is called STONITH --
"Shoot The Other Node In The Head".

You need some way when the cluster software decides to initiate failover to
ensure that the first node *cannot* come back up. That could mean shutting the
power to it at the PDU or disabling its network connection at the switch, or
various other options.
 Gregory Stark http://mit.edu/~gsstark/resume.pdf


Re: postmaster recovery and automatic restart suppression

From
Fujii Masao
Date:
Hi,

On Mon, Jun 8, 2009 at 6:45 PM, Gregory Stark<stark@enterprisedb.com> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>
>> On the other hand, the primary postgres might *not* restart automatically.
>> So, it's difficult for clusterware to choose whether to do failover when it
>> detects the death of the primary postgres, I think.
>
>
> I think the accepted way to handle this kind of situation is called STONITH --
> "Shoot The Other Node In The Head".
>
> You need some way when the cluster software decides to initiate failover to
> ensure that the first node *cannot* come back up. That could mean shutting the
> power to it at the PDU or disabling its network connection at the switch, or
> various other options.

Yes, I understand that STONITH is a safe solution for split-brain. But,
since some special equipment like PDU must probably be prepared,
I think that some people (including me) want another reasonable way.

The proposed feature is not perfect solution, but is a convenient way
to prevent one of split-brain situations.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I think the accepted way to handle this kind of situation is called STONITH --
> "Shoot The Other Node In The Head".

Yeah, and the reason people go to the trouble of having special hardware
for that is that pure-software solutions are unreliable.

I think the proposed don't-restart flag is exceedingly ugly and will not
solve any real-world problem.
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
Simon Riggs
Date:
On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote:

> I think the proposed don't-restart flag is exceedingly ugly and will not
> solve any real-world problem.

Agreed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: postmaster recovery and automatic restart suppression

From
Greg Stark
Date:
On Mon, Jun 8, 2009 at 6:58 PM, Simon Riggs<simon@2ndquadrant.com> wrote:
>
> On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote:
>
>> I think the proposed don't-restart flag is exceedingly ugly and will not
>> solve any real-world problem.
>
> Agreed.

Hm. I'm not sure I see a solid use case for it -- in my experience you
want to be pretty sure you have a persistent problem before you fail
over.

But I don't really see why it's ugly either. I mean our auto-restart
behaviour is pretty arbitrary. You could just as easily argue we
shouldn't auto-restart and rely on the user to restart the service
like he would any service which crashes.

I would file it under "mechanism not policy" and make it optional. The
user should be able to select what to do when a backend crash is
detected from amongst the various safe options, even if we think some
of the options don't have any use cases we can think of. Someone will
surely think of one at some point. (idly I wonder if cloud
environments where you can have an infinite supply of slaves are such
a use case...)

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
Greg Stark <stark@enterprisedb.com> writes:
>> On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote:
>>> I think the proposed don't-restart flag is exceedingly ugly and will not
>>> solve any real-world problem.

> Hm. I'm not sure I see a solid use case for it -- in my experience you
> want to be pretty sure you have a persistent problem before you fail
> over.

Yeah, and when you do fail over you want more guarantee than "none at
all" that the primary won't start back up again on its own.

> But I don't really see why it's ugly either.

Because it's intentionally blowing a hole in one of the most prized
properties of the database, ie, that it doesn't go down if it can help
it.  I want a *WHOLE* lot stronger rationale than "somebody might want
it someday" before providing a switch that lets somebody thoughtlessly
break a property we've sweated blood for ten years to ensure.
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
Robert Haas
Date:
On Mon, Jun 8, 2009 at 4:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@enterprisedb.com> writes:
>>> On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote:
>>>> I think the proposed don't-restart flag is exceedingly ugly and will not
>>>> solve any real-world problem.
>
>> Hm. I'm not sure I see a solid use case for it -- in my experience you
>> want to be pretty sure you have a persistent problem before you fail
>> over.
>
> Yeah, and when you do fail over you want more guarantee than "none at
> all" that the primary won't start back up again on its own.
>
>> But I don't really see why it's ugly either.
>
> Because it's intentionally blowing a hole in one of the most prized
> properties of the database, ie, that it doesn't go down if it can help
> it.  I want a *WHOLE* lot stronger rationale than "somebody might want
> it someday" before providing a switch that lets somebody thoughtlessly
> break a property we've sweated blood for ten years to ensure.

I see that you've carefully not quoted Greg's remark about "mechanism
not policy" with which I completely agree.  This seems like a pretty
useful switch for people who want more control over how the database
gets restarted on those rare occasions when it wipes out (and possibly
for debugging crash-type problems as well).  The amount of
blood-sweating that was required to make a robust automatic restart
mechanism doesn't seem relevant to this discussion, though it is
certainly a cool feature.

I also don't see any reason to assume that users will do this
"thoughtlessly".  Perhaps someone will, but if our policy is to not
add any features on the theory that someone might use in a stupid way,
we'd better get busy reverting a significant fraction of the work done
for 8.4.  I'm not going to go so far as to say that we should never
reject a feature because the danger of someone shooting themselves in
the foot is too high, but this doesn't even seem like a likely
candidate.  If we put an option in postgresql.conf called
"automatic_restart_after_crash = on", anyone who switches that to
"off" should have a pretty good idea what the likely consequences of
that decision will be.  The people who are too stupid to figure that
one out are likely to have a whole lot of other problems too, and
they're not the people at whom we should be targetting this product.

...Robert


Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I see that you've carefully not quoted Greg's remark about "mechanism
> not policy" with which I completely agree.

Mechanism should exist to support useful policy.  I don't believe that
the proposed switch has any real-world usefulness.
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
Robert Haas
Date:
On Mon, Jun 8, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I see that you've carefully not quoted Greg's remark about "mechanism
>> not policy" with which I completely agree.
>
> Mechanism should exist to support useful policy.  I don't believe that
> the proposed switch has any real-world usefulness.

I guess I agree that it doesn't seem to make much sense to trigger
failover on a DB crash, as the OP suggested.  The most likely cause of
a DB crash is probably a software bug, in which case failover isn't
going to help (won't you just trigger the same bug on the standby
server?).  The case where you'd probably want to do failover is when
the whole server has gone down to a hardware or power failure, in
which case your hypothetical home-grown supervisor process won't be
able to run anyway.

But I'm still not 100% convinced that the proposed mechanism is
useless.  There might be other reasons to want to get control in the
event of a crash.  You might want to page the system administrator, or
trigger a filesystem snapshot so you can go back and do a post-mortem.(The former could arguably be done just as well
byscanning the log 
file for the relevant log messages, I suppose, but the latter
certainly couldn't be, if your goal is to get a snapshot before
recovery is done.)

But maybe I'm all wet...

...Robert


Re: postmaster recovery and automatic restart suppression

From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi,

> -----Original Message-----
> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, June 09, 2009 1:35 AM
> To: Robert Haas
> Cc: Greg Stark; Simon Riggs; Fujii Masao; Kolb, Harald (NSN -
> DE/Munich); pgsql-hackers@postgresql.org; Czichy, Thoralf
> (NSN - FI/Helsinki)
> Subject: Re: [HACKERS] postmaster recovery and automatic
> restart suppression
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > I see that you've carefully not quoted Greg's remark about
> "mechanism
> > not policy" with which I completely agree.
>
> Mechanism should exist to support useful policy.  I don't believe that
> the proposed switch has any real-world usefulness.
>
>             regards, tom lane
>

There are some good reasons why a switchover could be an appropriate
means in case the DB is facing troubles. It may be that the root cause
is not the DB itsself, but used resources or other things which are
going crazy and hit the DB first ( we've seen a lot of these
unbelievable things which made us quite sensible for robustness
aspects). Therefore we want to have control on the DB recovery.
If you don't want to see this option as a GUC parameter, would it be
acceptable to have it as a new postmaster cmd line option ?

Regards, Harald Kolb.


Re: postmaster recovery and automatic restart suppression

From
Simon Riggs
Date:
On Tue, 2009-06-09 at 20:59 +0200, Kolb, Harald (NSN - DE/Munich) wrote:

> There are some good reasons why a switchover could be an appropriate
> means in case the DB is facing troubles. It may be that the root cause
> is not the DB itsself, but used resources or other things which are
> going crazy and hit the DB first ( we've seen a lot of these
> unbelievable things which made us quite sensible for robustness
> aspects). Therefore we want to have control on the DB recovery.
> If you don't want to see this option as a GUC parameter, would it be
> acceptable to have it as a new postmaster cmd line option ? 

Even if you had this, you still need to STONITH just in case the
failover happens by mistake. 

If you still have to take an action to be certain, what is the point of
the feature?

Most losses of availability are caused by human error and this seems
like one more way to blow your remaining toes off.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
"Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> writes:
> If you don't want to see this option as a GUC parameter, would it be
> acceptable to have it as a new postmaster cmd line option ? 

That would make two kluges, not one (we don't do options that are
settable in only one way).  And it does nothing whatever to address
my objection to the concept.
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
"Kevin Grittner"
Date:
"Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote:
>> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us] 
>> Mechanism should exist to support useful policy.  I don't believe
>> that the proposed switch has any real-world usefulness.
> There are some good reasons why a switchover could be an appropriate
> means in case the DB is facing troubles. It may be that the root
> cause is not the DB itsself, but used resources or other things
> which are going crazy and hit the DB first
Would an example of this be that one drive in a RAID has gone bad and
the hot spare rebuild has been triggered, leading to poor performance
for a while?  Is that the sort of issue where you see value?
-Kevin


Re: postmaster recovery and automatic restart suppression

From
Greg Stark
Date:
Not really since once you fail over you may as well stop the rebuild  
since you'll have to restore the whole database. Moreover wouldn't  
that have to be a manual decision?

The closest thing I can come to a use case would be if you run a very  
large cluster with hundreds of read-only replicas. If one has problems  
you would rather the load balancer notice and take it out of rotation  
immediately rather than have it flap and continue to cause problems.

Even there it would be dicey since a software bug could easily cause  
all your replicas to start misbehaving simultaneously. It would suck  
to see them all shut down one by one...

-- 
Greg


On 9 Jun 2009, at 20:53, "Kevin Grittner"  
<Kevin.Grittner@wicourts.gov> wrote:

> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote:
>>> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
>>> Mechanism should exist to support useful policy.  I don't believe
>>> that the proposed switch has any real-world usefulness.
>
>> There are some good reasons why a switchover could be an appropriate
>> means in case the DB is facing troubles. It may be that the root
>> cause is not the DB itsself, but used resources or other things
>> which are going crazy and hit the DB first
>
> Would an example of this be that one drive in a RAID has gone bad and
> the hot spare rebuild has been triggered, leading to poor performance
> for a while?  Is that the sort of issue where you see value?
>
> -Kevin


Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote:
>> There are some good reasons why a switchover could be an appropriate
>> means in case the DB is facing troubles. It may be that the root
>> cause is not the DB itsself, but used resources or other things
>> which are going crazy and hit the DB first
> Would an example of this be that one drive in a RAID has gone bad and
> the hot spare rebuild has been triggered, leading to poor performance
> for a while?  Is that the sort of issue where you see value?

How would that be connected to a "no restart on crash" setting?
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote:
>>> There are some good reasons why a switchover could be an
>>> appropriate means in case the DB is facing troubles. It may be
>>> that the root cause is not the DB itself, but used resources or
>>> other things which are going crazy and hit the DB first
> 
>> Would an example of this be that one drive in a RAID has gone bad
>> and the hot spare rebuild has been triggered, leading to poor
>> performance for a while?  Is that the sort of issue where you see
>> value?
> 
> How would that be connected to a "no restart on crash" setting?
It wouldn't; but I'm trying to better understand the problem the OP is
trying to solve, to see where that leads.
My first reaction on hearing the request was that it might have *some*
use; but in trying to recall any restart where it is what I would have
wanted, I come up dry.  I haven't even really come up with a good
hypothetical use case.  But I get the feeling the OP has had some
problem this is attempting to address.  I'm just not clear what that
is.
-Kevin


Re: postmaster recovery and automatic restart suppression

From
Simon Riggs
Date:
On Tue, 2009-06-09 at 15:48 -0500, Kevin Grittner wrote:
> My first reaction on hearing the request was that it might have *some*
> use; but in trying to recall any restart where it is what I would have
> wanted, I come up dry.  I haven't even really come up with a good
> hypothetical use case.  But I get the feeling the OP has had some
> problem this is attempting to address.  I'm just not clear what that
> is.

I think we need to answer why shutting the database down is insufficient
response to the need to having it be shutdown in the event of failover.
It always sounds neat to have a new feature, but often we already have
it. (I'm sure I'm as guilty of that as the next person).

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: postmaster recovery and automatic restart suppression

From
Fujii Masao
Date:
Hi,

On Wed, Jun 10, 2009 at 4:21 AM, Simon Riggs<simon@2ndquadrant.com> wrote:
>
> On Tue, 2009-06-09 at 20:59 +0200, Kolb, Harald (NSN - DE/Munich) wrote:
>
>> There are some good reasons why a switchover could be an appropriate
>> means in case the DB is facing troubles. It may be that the root cause
>> is not the DB itsself, but used resources or other things which are
>> going crazy and hit the DB first ( we've seen a lot of these
>> unbelievable things which made us quite sensible for robustness
>> aspects). Therefore we want to have control on the DB recovery.
>> If you don't want to see this option as a GUC parameter, would it be
>> acceptable to have it as a new postmaster cmd line option ?
>
> Even if you had this, you still need to STONITH just in case the
> failover happens by mistake.

Yes. On second thought, probably we should solve this kind of problem
outside of Postgres.

> Is there a possibility to deactivate the restart and to force the postmaster
> to simply exit at the end ?
> The background is that we will have a watchdog process which will in
> this case perform a fast switchover to the standby side (in case of
> syncronous replication) or will restart the db by its own and in addition
> will perform some specific actions.

To return to the original Harald's problem, the watchdog process can
shoot postmaster before doing the next action.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: postmaster recovery and automatic restart suppression

From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi

> -----Original Message-----
> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, June 09, 2009 9:20 PM
> To: Kolb, Harald (NSN - DE/Munich)
> Cc: Robert Haas; Greg Stark; Simon Riggs; Fujii Masao;
> pgsql-hackers@postgresql.org; Czichy, Thoralf (NSN - FI/Helsinki)
> Subject: Re: [HACKERS] postmaster recovery and automatic
> restart suppression
>
> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> writes:
> > If you don't want to see this option as a GUC parameter, would it be
> > acceptable to have it as a new postmaster cmd line option ?
>
> That would make two kluges, not one (we don't do options that are
> settable in only one way).  And it does nothing whatever to address
> my objection to the concept.
>
>             regards, tom lane
>

First point is understood.
Second point needs further discussion:
The recovery and restart feature is an excellent solution if the db is
running in a standalone environment and I understand that this should
not be weakened. But in a configuration where the db is only one
resource among others and where you have a central supervisor, it's
problematic. Then this central instance observes all the resources and
services and decides what to do in case of problems. It's not up to the
resource/service to make it's own decision because it's only a piece of
the cake and doesn't has the complete view to the whole situation.
E.g. the behaviour might be different if the problems occurr during an
overload situation or if you already have hints to HW related problems
or if you are in an upgrade procedure and the initial start fails. An
uncontrolled and undetected automatic restart may complicate the
situation and increase the outage time.
Thus it would be helpful to have the possibility of a very fast failure
detection (SIGCHLD in controlling instance) and to avoid wasteful
cleanup procedures.
If the db is embedded in a management (High Availability) environment,
this option will be helpful in general, independent if you have a
cluster or a single node.
But in a cluster environment it would be more important to have this
switch, because you always will have this management instance, the
cluster software. And of course the main reason of a cluster is to
switch over when it makes sense to do so. And one good reason to realy
do it is when a central instance like the db on the primary side
crashes. At least the user should have the possibility to decide this,
but this would require that PostgreSQL constructively supports this
situation.

Regards, Harald.


Re: postmaster recovery and automatic restart suppression

From
Alvaro Herrera
Date:
Kolb, Harald (NSN - DE/Munich) escribió:

> The recovery and restart feature is an excellent solution if the db is
> running in a standalone environment and I understand that this should
> not be weakened. But in a configuration where the db is only one
> resource among others and where you have a central supervisor, it's
> problematic. Then this central instance observes all the resources and
> services and decides what to do in case of problems. It's not up to the
> resource/service to make it's own decision because it's only a piece of
> the cake and doesn't has the complete view to the whole situation.

Surely you can just stop the postmaster while it's on recovery from the
supervisor when you detect this.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: postmaster recovery and automatic restart suppression

From
"Czichy, Thoralf (NSN - FI/Helsinki)"
Date:

hi,

I am working together with Harald on this issue. Below some thoughts on
why we think it should be possible to disable the postmaster-internal
recovery attempt and instead have faults in the processes started
by postmaster escalated to postmaster-exit.



[Our typical "embedded" situation]

* Database is small 0.1 to 1 GB (e.g. we consider it the safest strategy
 to copy the whole database from the active to standby before  reconnecting the standby after switchover or failover).

* Few clients only (10-100)

* There is no shared storage between the two instances (this means no  concurrent access to shared resources, no
isolationproblems for  shared resources) 

* Switchover is fast, less than a few seconds

* Disk I/O is slow (no RAID, possibly (slow) flash-based)

* The same nodes running database also run lots of other functionality  (some dependent on DB, most not)



[Keep recovery decision and recovery action in cluster-HA-middleware]

Actually the problem we're trying to solve is to keep the decision
what's
the best recovery strategy outside of the DB. In our use case this logic

is expressed in the cluster-HA-middleware and recovery actions are
initiated
by this middleware rather than each individual piece of software started
by
it; software is generally expected to "fail fast and safe" in case of
errors. As long as you trust hardware and OS kernel, a process exit is
usually such a fail fast and safe operation. It's "Safe" because process

exit causes the kernel to release the resources the process holds. It's
also
fast. Though, "fast" is a bit more debatable as a simple signal from the

postmaster to the cluster middleware would probably be faster. However
lacking such a signal, a SIGCHILD is the next best thing.

The middleware can make decisions such as (all of this is configurable
and postmaster-health is _just_one_input_ of many to reach a decision on

the correct behavior)
Policy 1: By default try to restart the active instance N times, after           that do a switchoverPolicy 2: If the
activePostgres fails and the standby is available and 
          up-to-date, do an immediate switchover. If the standby is not
          available, restart.Policy 3: If the active Postgres fails, escalate the problem to
node-level,          isolate the active node and do the switchover to the standby.
Policy 4: In single-node systems, restart db instance N times. If it
fails           more often than N times in X seconds, stop it and give an           indication to the operator
(SNMP-trapto management system, 
text           message, ...) that something is seriously wrong and manual           intervention is needed.

In the current setup we want to go for Policy 2. In earlier unrelated
products (not using PostgreSQL) we actually had policies 1, 3 and 4.

Another typical situation is that recovery behavior is different during
upgrades compared to the behavior during normal operation. E.g. when
the (new) database instance fails during an automatic schema-conversion
during upgrade we would want to automatically fallback to the previous
version.



[STONITH is not always best strategy if failures can be declared as
user-space software problem only, limit STONITH to HW/OS failures]

The isolation of the failing Postgres instance does not require a
STONITH
- mainly as there's also other software running on the same node that
we'd
not want to automatically switchover (e.g. because it takes longer to do
or
the functionality is more critical or less critical). Also we generally
trust
the HW, OS kernel and cluster middleware to behave correctly . These
functions
also follow the principle of fail-fast-and-safe. This trust might be an
assumption that not everybody agrees with, though. So, if the failure
originated
from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
user-space problem - the default assumption is that isolation can be
implemented on
OS-level and that's a guarantee that the clusterware gives (using a
separate
Quorum mechanism to avoid split-brain situations).




[Example of user-space software failures]

So, what kind of failures would cause a user-space switchover rather
than
node-level isolation? This gets a bit philosophical. If you assume that
many
software failures are caused by concurrency issues, switching over to
the
standby is actually a good strategy as it's unlikely that the same
concurrency
issue happens again on the standby. Another reason for software failures

is entering exceptional situations, such as disk getting full, overload
on the
node (causes by some other process), backup being taken, upgrade
conversion
etc. So here the idea is that failover to a standby instance helps as
long as
there's some hope that on the standby side the situation is different.
If we'd
just have an internal Postgres restart in such situations, we'd have
flapping
db connectivity - without the operator even being aware of it (awareness
about
problem situations is also something that the cluster HA middleware
takes care
of).



[Possible implementation options]

I see only two solutions to allow an external cluster-HA-middleware to
make
recovery decisions:
  (1) postmaster process exits if it detects any unpredicted failure or
  (2) have postmaster provide an interface to notify about software       failures (i.e. the case it goes into
postmasterre-initializing). 

In case (2) it would be the cluster-HA-middleware that isolates the
postmaster
process, e.g. by SIGKILL-ing all related processes and forcefully
releasing all
shared resources that it uses. However, I favor case (1) as long as we
would keep
the logic that runs within the postmaster in case it detects a backend
process
failure as simple as possible - meaning force-stop all postgres
processes
(SIGKILL), wait for SIGCHLD from them and exit (should only take few
milliseconds).


[Question]

So the question remains: Is this behavior and the most likely addition
of a
postgresql.conf ""automatic_restart_after_crash = on" something that
completely
goes against the Postgres philosopy or is this something that once
implemented
would be acceptable to have in the main Postgres code base?


Thoralf


Re: postmaster recovery and automatic restart suppression

From
Tom Lane
Date:
"Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf.czichy@nsn.com> writes:
> I am working together with Harald on this issue. Below some thoughts on 
> why we think it should be possible to disable the postmaster-internal 
> recovery attempt and instead have faults in the processes started 
> by postmaster escalated to postmaster-exit.

I'll tell you what the fundamental problem with this is: it's converting
Postgres into a piece of software that is completely dependent on some
hypothetical outside management code in order to meet one of its basic
design goals.  That isn't going to go over very well to start with.
Until you have written such management code, made it freely available,
and demonstrated that this type of recovery approach is *actually* not
hypothetically useful in a real-world environment, it's unlikely
that anyone is going to want to consider it.

I'd recommend just carrying a private patch to make Postgres do what
you want ... it's unlikely to be the only such patch you need anyway.
One obvious example is that nothing you describe is sensible without
exposing more information than "something failed" to the outside
management code.  You'll want some kind of API in there to pass on
whatever the postmaster knows to the outside code.

We might consider adopting a set of patches like that once it's been
demonstrated to be useful for a live project, but I don't think we'll
accept it on speculation.
        regards, tom lane


Re: postmaster recovery and automatic restart suppression

From
Fujii Masao
Date:
Hi,

On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN -
FI/Helsinki)<thoralf.czichy@nsn.com> wrote:
> [STONITH is not always best strategy if failures can be declared as
> user-space software problem only, limit STONITH to HW/OS failures]
>
> The isolation of the failing Postgres instance does not require a
> STONITH
> - mainly as there's also other software running on the same node that
> we'd
> not want to automatically switchover (e.g. because it takes longer to do
> or
> the functionality is more critical or less critical). Also we generally
> trust
> the HW, OS kernel and cluster middleware to behave correctly . These
> functions
> also follow the principle of fail-fast-and-safe. This trust might be an
> assumption that not everybody agrees with, though. So, if the failure
> originated
> from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
> user-space problem - the default assumption is that isolation can be
> implemented on
> OS-level and that's a guarantee that the clusterware gives (using a
> separate
> Quorum mechanism to avoid split-brain situations).

HW-level STONITH seems to be too much for your case. How about making
your HA-middleware shut the dying postgres down before doing switchover
by using (for example) "pg_ctl -mi stop"? In this case, other
softwares can still
keep on running on the original node after switchover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center