Thread: postmaster recovery and automatic restart suppression
<p><span lang="de"><font face="Arial" size="2">Hi,</font></span><p><span lang="de"><font face="Arial" size="2">in case ofa serious failure of a backend or an auxiliary process the postmaster performs a crash recovery and restarts the db automatically.</font></span><p><spanlang="de"><font face="Arial" size="2">Is there a possibility to deactivate the restartand to force the postmaster to simply exit at the end ? </font></span><br /><span lang="de"><font face="Arial" size="2">Thebackground is that we will have a watchdog process which will in this case perform a fast switchover to the standbyside (in case of syncronous replication) or will restart the db by its own and in addition will perform some specificactions. </font></span><p><span lang="de"><font face="Arial" size="2">Regards, </font></span><p><span lang="de"><fontface="Arial" size="2">Harald Kolb.</font></span><p><span lang="de"><font face="Arial" size="2"> </font></span><spanlang="en-us"> </span><br /><span lang="de"><font color="#000000" face="Arial" size="2">Bestregards / freundliche Grüße</font></span><br /><span lang="de"><font color="#000080" face="Arial" size="2">-----------------------------------------</font></span><br/><span lang="de"><font color="#000000" face="Arial" size="2">HaraldKolb</font></span><br /><span lang="en-us"><font color="#000000" face="Arial" size="2">COO RTP PD SW RD AreaB 1 DE</font></span><br /><span lang="en-us"><font color="#000000" face="Arial" size="2">Mch-M Building 5532 / Room 3045</font></span><br/><span lang="en-us"><font color="#000000" face="Arial" size="2">Tel: +49 89 636 47606</font></span><p><spanlang="es"></span><a href="mailto:Harald.Kolb@nsn.com"><span lang="es"><u></u><u><font color="#0000FF"face="Arial" size="2">mailto:Harald.Kolb@nsn.com</font></u></span><span lang="es"></span></a><span lang="es"></span><spanlang="es"></span><span lang="es"></span><span lang="es"></span><span lang="es"></span><span lang="de"><fontcolor="#808080" face="Arial" size="2"></font> </span><br /><span lang="de"></span><a href="http://www.nokiasiemensnetworks.com/global/"><spanlang="de"><u></u><u><font color="#0000FF" face="Arial" size="2">http://www.nokiasiemensnetworks.com/global/</font></u></span></a><span lang="de"><u></u><u></u><u></u></span><p><spanlang="de"><b><font color="#C0C0C0" face="Times New Roman" size="1">Nokia</font></b><fontcolor="#C0C0C0" face="Times New Roman" size="1"></font><b> <font color="#C0C0C0" face="TimesNew Roman" size="1">Siemens Networks GmbH & Co. KG<br /></font><font color="#C0C0C0" face="Arial" size="1">Sitzder Gesellschaft: München / Registered office: Munich<br /> Registergericht: München / Commercial registry:Munich, HRA 88537 WEEE-Reg.-Nr.: DE 52984304</font></b></span><br /><span lang="de"><b><font color="#C0C0C0" face="Arial"size="1">Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH<br />Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke</font></b></span><br /><span lang="de"><b><font color="#C0C0C0"face="Arial" size="1">Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen<br />Sitz der Gesellschaft: München / Registered office: Munich<br /> Registergericht: München / Commercial registry: Munich,HRB 163416</font><br /></b></span><span lang="en-us"></span><br />
Hi, On Fri, Jun 5, 2009 at 1:02 AM, Kolb, Harald (NSN - DE/Munich) <harald.kolb@nsn.com> wrote: > Hi, > > in case of a serious failure of a backend or an auxiliary process the > postmaster performs a crash recovery and restarts the db automatically. > > Is there a possibility to deactivate the restart and to force the postmaster > to simply exit at the end ? Good point. I also think that this makes a handling of failover more complicated. In other words, clusterware cannot determine whether to do failover when it detects the death of the primary postgres. A wrong decision might cause split brain syndrome. How about new GUC parameter to determine whether to restart postmaster automatically when it fails abnormally? This would be useful for various failover system. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: postmaster recovery and automatic restart suppression
From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi, > -----Original Message----- > From: ext Fujii Masao [mailto:masao.fujii@gmail.com] > Sent: Friday, June 05, 2009 8:14 AM > To: Kolb, Harald (NSN - DE/Munich) > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] postmaster recovery and automatic > restart suppression > > Hi, > > On Fri, Jun 5, 2009 at 1:02 AM, Kolb, Harald (NSN - DE/Munich) > <harald.kolb@nsn.com> wrote: > > Hi, > > > > in case of a serious failure of a backend or an auxiliary > process the > > postmaster performs a crash recovery and restarts the db > automatically. > > > > Is there a possibility to deactivate the restart and to > force the postmaster > > to simply exit at the end ? > > Good point. I also think that this makes a handling of failover > more complicated. In other words, clusterware cannot determine > whether to do failover when it detects the death of the primary > postgres. A wrong decision might cause split brain syndrome. Mh, I cannot follow your reflections. Could you explain a little bit more ? > > How about new GUC parameter to determine whether to restart > postmaster automatically when it fails abnormally? This would > be useful for various failover system. A new GUC parameter would be the optimal solution. Since I'm new to the community, what's the "usual" way to make this happen ? Regards, Harald.
Hi, On Fri, Jun 5, 2009 at 9:24 PM, Kolb, Harald (NSN - DE/Munich)<harald.kolb@nsn.com> wrote: >> Good point. I also think that this makes a handling of failover >> more complicated. In other words, clusterware cannot determine >> whether to do failover when it detects the death of the primary >> postgres. A wrong decision might cause split brain syndrome. > Mh, I cannot follow your reflections. Could you explain a little bit > more ? >> >> How about new GUC parameter to determine whether to restart >> postmaster automatically when it fails abnormally? This would >> be useful for various failover system. The primary postgres might restart automatically after clusterware finished failover (i.e. the standby postgres has came up live). In this case, postgres would work in each server, and they are independent of each other. This is known as one of Split-Brain syndrome. The problem is that, for example, if they share the archival storage, some archived files might get lost; the original primary postgres might overwrite the archived file which is written by the new primary. On the other hand, the primary postgres might *not* restart automatically. So, it's difficult for clusterware to choose whether to do failover when it detects the deatch of the primary postgres, I think. > A new GUC parameter would be the optimal solution. > Since I'm new to the community, what's the "usual" way to make this > happen ? The followings might be a good reference to you. http://www.pgcon.org/2009/schedule/events/178.en.html http://wiki.postgresql.org/wiki/Submitting_a_Patch Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On the other hand, the primary postgres might *not* restart automatically. > So, it's difficult for clusterware to choose whether to do failover when it > detects the death of the primary postgres, I think. I think the accepted way to handle this kind of situation is called STONITH -- "Shoot The Other Node In The Head". You need some way when the cluster software decides to initiate failover to ensure that the first node *cannot* come back up. That could mean shutting the power to it at the PDU or disabling its network connection at the switch, or various other options. Gregory Stark http://mit.edu/~gsstark/resume.pdf
Hi, On Mon, Jun 8, 2009 at 6:45 PM, Gregory Stark<stark@enterprisedb.com> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: > >> On the other hand, the primary postgres might *not* restart automatically. >> So, it's difficult for clusterware to choose whether to do failover when it >> detects the death of the primary postgres, I think. > > > I think the accepted way to handle this kind of situation is called STONITH -- > "Shoot The Other Node In The Head". > > You need some way when the cluster software decides to initiate failover to > ensure that the first node *cannot* come back up. That could mean shutting the > power to it at the PDU or disabling its network connection at the switch, or > various other options. Yes, I understand that STONITH is a safe solution for split-brain. But, since some special equipment like PDU must probably be prepared, I think that some people (including me) want another reasonable way. The proposed feature is not perfect solution, but is a convenient way to prevent one of split-brain situations. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Gregory Stark <stark@enterprisedb.com> writes: > I think the accepted way to handle this kind of situation is called STONITH -- > "Shoot The Other Node In The Head". Yeah, and the reason people go to the trouble of having special hardware for that is that pure-software solutions are unreliable. I think the proposed don't-restart flag is exceedingly ugly and will not solve any real-world problem. regards, tom lane
On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote: > I think the proposed don't-restart flag is exceedingly ugly and will not > solve any real-world problem. Agreed. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, Jun 8, 2009 at 6:58 PM, Simon Riggs<simon@2ndquadrant.com> wrote: > > On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote: > >> I think the proposed don't-restart flag is exceedingly ugly and will not >> solve any real-world problem. > > Agreed. Hm. I'm not sure I see a solid use case for it -- in my experience you want to be pretty sure you have a persistent problem before you fail over. But I don't really see why it's ugly either. I mean our auto-restart behaviour is pretty arbitrary. You could just as easily argue we shouldn't auto-restart and rely on the user to restart the service like he would any service which crashes. I would file it under "mechanism not policy" and make it optional. The user should be able to select what to do when a backend crash is detected from amongst the various safe options, even if we think some of the options don't have any use cases we can think of. Someone will surely think of one at some point. (idly I wonder if cloud environments where you can have an infinite supply of slaves are such a use case...) -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <stark@enterprisedb.com> writes: >> On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote: >>> I think the proposed don't-restart flag is exceedingly ugly and will not >>> solve any real-world problem. > Hm. I'm not sure I see a solid use case for it -- in my experience you > want to be pretty sure you have a persistent problem before you fail > over. Yeah, and when you do fail over you want more guarantee than "none at all" that the primary won't start back up again on its own. > But I don't really see why it's ugly either. Because it's intentionally blowing a hole in one of the most prized properties of the database, ie, that it doesn't go down if it can help it. I want a *WHOLE* lot stronger rationale than "somebody might want it someday" before providing a switch that lets somebody thoughtlessly break a property we've sweated blood for ten years to ensure. regards, tom lane
On Mon, Jun 8, 2009 at 4:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@enterprisedb.com> writes: >>> On Mon, 2009-06-08 at 09:47 -0400, Tom Lane wrote: >>>> I think the proposed don't-restart flag is exceedingly ugly and will not >>>> solve any real-world problem. > >> Hm. I'm not sure I see a solid use case for it -- in my experience you >> want to be pretty sure you have a persistent problem before you fail >> over. > > Yeah, and when you do fail over you want more guarantee than "none at > all" that the primary won't start back up again on its own. > >> But I don't really see why it's ugly either. > > Because it's intentionally blowing a hole in one of the most prized > properties of the database, ie, that it doesn't go down if it can help > it. I want a *WHOLE* lot stronger rationale than "somebody might want > it someday" before providing a switch that lets somebody thoughtlessly > break a property we've sweated blood for ten years to ensure. I see that you've carefully not quoted Greg's remark about "mechanism not policy" with which I completely agree. This seems like a pretty useful switch for people who want more control over how the database gets restarted on those rare occasions when it wipes out (and possibly for debugging crash-type problems as well). The amount of blood-sweating that was required to make a robust automatic restart mechanism doesn't seem relevant to this discussion, though it is certainly a cool feature. I also don't see any reason to assume that users will do this "thoughtlessly". Perhaps someone will, but if our policy is to not add any features on the theory that someone might use in a stupid way, we'd better get busy reverting a significant fraction of the work done for 8.4. I'm not going to go so far as to say that we should never reject a feature because the danger of someone shooting themselves in the foot is too high, but this doesn't even seem like a likely candidate. If we put an option in postgresql.conf called "automatic_restart_after_crash = on", anyone who switches that to "off" should have a pretty good idea what the likely consequences of that decision will be. The people who are too stupid to figure that one out are likely to have a whole lot of other problems too, and they're not the people at whom we should be targetting this product. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I see that you've carefully not quoted Greg's remark about "mechanism > not policy" with which I completely agree. Mechanism should exist to support useful policy. I don't believe that the proposed switch has any real-world usefulness. regards, tom lane
On Mon, Jun 8, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I see that you've carefully not quoted Greg's remark about "mechanism >> not policy" with which I completely agree. > > Mechanism should exist to support useful policy. I don't believe that > the proposed switch has any real-world usefulness. I guess I agree that it doesn't seem to make much sense to trigger failover on a DB crash, as the OP suggested. The most likely cause of a DB crash is probably a software bug, in which case failover isn't going to help (won't you just trigger the same bug on the standby server?). The case where you'd probably want to do failover is when the whole server has gone down to a hardware or power failure, in which case your hypothetical home-grown supervisor process won't be able to run anyway. But I'm still not 100% convinced that the proposed mechanism is useless. There might be other reasons to want to get control in the event of a crash. You might want to page the system administrator, or trigger a filesystem snapshot so you can go back and do a post-mortem.(The former could arguably be done just as well byscanning the log file for the relevant log messages, I suppose, but the latter certainly couldn't be, if your goal is to get a snapshot before recovery is done.) But maybe I'm all wet... ...Robert
Re: postmaster recovery and automatic restart suppression
From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi, > -----Original Message----- > From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, June 09, 2009 1:35 AM > To: Robert Haas > Cc: Greg Stark; Simon Riggs; Fujii Masao; Kolb, Harald (NSN - > DE/Munich); pgsql-hackers@postgresql.org; Czichy, Thoralf > (NSN - FI/Helsinki) > Subject: Re: [HACKERS] postmaster recovery and automatic > restart suppression > > Robert Haas <robertmhaas@gmail.com> writes: > > I see that you've carefully not quoted Greg's remark about > "mechanism > > not policy" with which I completely agree. > > Mechanism should exist to support useful policy. I don't believe that > the proposed switch has any real-world usefulness. > > regards, tom lane > There are some good reasons why a switchover could be an appropriate means in case the DB is facing troubles. It may be that the root cause is not the DB itsself, but used resources or other things which are going crazy and hit the DB first ( we've seen a lot of these unbelievable things which made us quite sensible for robustness aspects). Therefore we want to have control on the DB recovery. If you don't want to see this option as a GUC parameter, would it be acceptable to have it as a new postmaster cmd line option ? Regards, Harald Kolb.
On Tue, 2009-06-09 at 20:59 +0200, Kolb, Harald (NSN - DE/Munich) wrote: > There are some good reasons why a switchover could be an appropriate > means in case the DB is facing troubles. It may be that the root cause > is not the DB itsself, but used resources or other things which are > going crazy and hit the DB first ( we've seen a lot of these > unbelievable things which made us quite sensible for robustness > aspects). Therefore we want to have control on the DB recovery. > If you don't want to see this option as a GUC parameter, would it be > acceptable to have it as a new postmaster cmd line option ? Even if you had this, you still need to STONITH just in case the failover happens by mistake. If you still have to take an action to be certain, what is the point of the feature? Most losses of availability are caused by human error and this seems like one more way to blow your remaining toes off. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
"Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> writes: > If you don't want to see this option as a GUC parameter, would it be > acceptable to have it as a new postmaster cmd line option ? That would make two kluges, not one (we don't do options that are settable in only one way). And it does nothing whatever to address my objection to the concept. regards, tom lane
"Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote: >> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us] >> Mechanism should exist to support useful policy. I don't believe >> that the proposed switch has any real-world usefulness. > There are some good reasons why a switchover could be an appropriate > means in case the DB is facing troubles. It may be that the root > cause is not the DB itsself, but used resources or other things > which are going crazy and hit the DB first Would an example of this be that one drive in a RAID has gone bad and the hot spare rebuild has been triggered, leading to poor performance for a while? Is that the sort of issue where you see value? -Kevin
Not really since once you fail over you may as well stop the rebuild since you'll have to restore the whole database. Moreover wouldn't that have to be a manual decision? The closest thing I can come to a use case would be if you run a very large cluster with hundreds of read-only replicas. If one has problems you would rather the load balancer notice and take it out of rotation immediately rather than have it flap and continue to cause problems. Even there it would be dicey since a software bug could easily cause all your replicas to start misbehaving simultaneously. It would suck to see them all shut down one by one... -- Greg On 9 Jun 2009, at 20:53, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote: >>> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us] > >>> Mechanism should exist to support useful policy. I don't believe >>> that the proposed switch has any real-world usefulness. > >> There are some good reasons why a switchover could be an appropriate >> means in case the DB is facing troubles. It may be that the root >> cause is not the DB itsself, but used resources or other things >> which are going crazy and hit the DB first > > Would an example of this be that one drive in a RAID has gone bad and > the hot spare rebuild has been triggered, leading to poor performance > for a while? Is that the sort of issue where you see value? > > -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote: >> There are some good reasons why a switchover could be an appropriate >> means in case the DB is facing troubles. It may be that the root >> cause is not the DB itsself, but used resources or other things >> which are going crazy and hit the DB first > Would an example of this be that one drive in a RAID has gone bad and > the hot spare rebuild has been triggered, leading to poor performance > for a while? Is that the sort of issue where you see value? How would that be connected to a "no restart on crash" setting? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> wrote: >>> There are some good reasons why a switchover could be an >>> appropriate means in case the DB is facing troubles. It may be >>> that the root cause is not the DB itself, but used resources or >>> other things which are going crazy and hit the DB first > >> Would an example of this be that one drive in a RAID has gone bad >> and the hot spare rebuild has been triggered, leading to poor >> performance for a while? Is that the sort of issue where you see >> value? > > How would that be connected to a "no restart on crash" setting? It wouldn't; but I'm trying to better understand the problem the OP is trying to solve, to see where that leads. My first reaction on hearing the request was that it might have *some* use; but in trying to recall any restart where it is what I would have wanted, I come up dry. I haven't even really come up with a good hypothetical use case. But I get the feeling the OP has had some problem this is attempting to address. I'm just not clear what that is. -Kevin
On Tue, 2009-06-09 at 15:48 -0500, Kevin Grittner wrote: > My first reaction on hearing the request was that it might have *some* > use; but in trying to recall any restart where it is what I would have > wanted, I come up dry. I haven't even really come up with a good > hypothetical use case. But I get the feeling the OP has had some > problem this is attempting to address. I'm just not clear what that > is. I think we need to answer why shutting the database down is insufficient response to the need to having it be shutdown in the event of failover. It always sounds neat to have a new feature, but often we already have it. (I'm sure I'm as guilty of that as the next person). -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Wed, Jun 10, 2009 at 4:21 AM, Simon Riggs<simon@2ndquadrant.com> wrote: > > On Tue, 2009-06-09 at 20:59 +0200, Kolb, Harald (NSN - DE/Munich) wrote: > >> There are some good reasons why a switchover could be an appropriate >> means in case the DB is facing troubles. It may be that the root cause >> is not the DB itsself, but used resources or other things which are >> going crazy and hit the DB first ( we've seen a lot of these >> unbelievable things which made us quite sensible for robustness >> aspects). Therefore we want to have control on the DB recovery. >> If you don't want to see this option as a GUC parameter, would it be >> acceptable to have it as a new postmaster cmd line option ? > > Even if you had this, you still need to STONITH just in case the > failover happens by mistake. Yes. On second thought, probably we should solve this kind of problem outside of Postgres. > Is there a possibility to deactivate the restart and to force the postmaster > to simply exit at the end ? > The background is that we will have a watchdog process which will in > this case perform a fast switchover to the standby side (in case of > syncronous replication) or will restart the db by its own and in addition > will perform some specific actions. To return to the original Harald's problem, the watchdog process can shoot postmaster before doing the next action. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: postmaster recovery and automatic restart suppression
From
"Kolb, Harald (NSN - DE/Munich)"
Date:
Hi > -----Original Message----- > From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, June 09, 2009 9:20 PM > To: Kolb, Harald (NSN - DE/Munich) > Cc: Robert Haas; Greg Stark; Simon Riggs; Fujii Masao; > pgsql-hackers@postgresql.org; Czichy, Thoralf (NSN - FI/Helsinki) > Subject: Re: [HACKERS] postmaster recovery and automatic > restart suppression > > "Kolb, Harald (NSN - DE/Munich)" <harald.kolb@nsn.com> writes: > > If you don't want to see this option as a GUC parameter, would it be > > acceptable to have it as a new postmaster cmd line option ? > > That would make two kluges, not one (we don't do options that are > settable in only one way). And it does nothing whatever to address > my objection to the concept. > > regards, tom lane > First point is understood. Second point needs further discussion: The recovery and restart feature is an excellent solution if the db is running in a standalone environment and I understand that this should not be weakened. But in a configuration where the db is only one resource among others and where you have a central supervisor, it's problematic. Then this central instance observes all the resources and services and decides what to do in case of problems. It's not up to the resource/service to make it's own decision because it's only a piece of the cake and doesn't has the complete view to the whole situation. E.g. the behaviour might be different if the problems occurr during an overload situation or if you already have hints to HW related problems or if you are in an upgrade procedure and the initial start fails. An uncontrolled and undetected automatic restart may complicate the situation and increase the outage time. Thus it would be helpful to have the possibility of a very fast failure detection (SIGCHLD in controlling instance) and to avoid wasteful cleanup procedures. If the db is embedded in a management (High Availability) environment, this option will be helpful in general, independent if you have a cluster or a single node. But in a cluster environment it would be more important to have this switch, because you always will have this management instance, the cluster software. And of course the main reason of a cluster is to switch over when it makes sense to do so. And one good reason to realy do it is when a central instance like the db on the primary side crashes. At least the user should have the possibility to decide this, but this would require that PostgreSQL constructively supports this situation. Regards, Harald.
Kolb, Harald (NSN - DE/Munich) escribió: > The recovery and restart feature is an excellent solution if the db is > running in a standalone environment and I understand that this should > not be weakened. But in a configuration where the db is only one > resource among others and where you have a central supervisor, it's > problematic. Then this central instance observes all the resources and > services and decides what to do in case of problems. It's not up to the > resource/service to make it's own decision because it's only a piece of > the cake and doesn't has the complete view to the whole situation. Surely you can just stop the postmaster while it's on recovery from the supervisor when you detect this. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: postmaster recovery and automatic restart suppression
From
"Czichy, Thoralf (NSN - FI/Helsinki)"
Date:
hi, I am working together with Harald on this issue. Below some thoughts on why we think it should be possible to disable the postmaster-internal recovery attempt and instead have faults in the processes started by postmaster escalated to postmaster-exit. [Our typical "embedded" situation] * Database is small 0.1 to 1 GB (e.g. we consider it the safest strategy to copy the whole database from the active to standby before reconnecting the standby after switchover or failover). * Few clients only (10-100) * There is no shared storage between the two instances (this means no concurrent access to shared resources, no isolationproblems for shared resources) * Switchover is fast, less than a few seconds * Disk I/O is slow (no RAID, possibly (slow) flash-based) * The same nodes running database also run lots of other functionality (some dependent on DB, most not) [Keep recovery decision and recovery action in cluster-HA-middleware] Actually the problem we're trying to solve is to keep the decision what's the best recovery strategy outside of the DB. In our use case this logic is expressed in the cluster-HA-middleware and recovery actions are initiated by this middleware rather than each individual piece of software started by it; software is generally expected to "fail fast and safe" in case of errors. As long as you trust hardware and OS kernel, a process exit is usually such a fail fast and safe operation. It's "Safe" because process exit causes the kernel to release the resources the process holds. It's also fast. Though, "fast" is a bit more debatable as a simple signal from the postmaster to the cluster middleware would probably be faster. However lacking such a signal, a SIGCHILD is the next best thing. The middleware can make decisions such as (all of this is configurable and postmaster-health is _just_one_input_ of many to reach a decision on the correct behavior) Policy 1: By default try to restart the active instance N times, after that do a switchoverPolicy 2: If the activePostgres fails and the standby is available and up-to-date, do an immediate switchover. If the standby is not available, restart.Policy 3: If the active Postgres fails, escalate the problem to node-level, isolate the active node and do the switchover to the standby. Policy 4: In single-node systems, restart db instance N times. If it fails more often than N times in X seconds, stop it and give an indication to the operator (SNMP-trapto management system, text message, ...) that something is seriously wrong and manual intervention is needed. In the current setup we want to go for Policy 2. In earlier unrelated products (not using PostgreSQL) we actually had policies 1, 3 and 4. Another typical situation is that recovery behavior is different during upgrades compared to the behavior during normal operation. E.g. when the (new) database instance fails during an automatic schema-conversion during upgrade we would want to automatically fallback to the previous version. [STONITH is not always best strategy if failures can be declared as user-space software problem only, limit STONITH to HW/OS failures] The isolation of the failing Postgres instance does not require a STONITH - mainly as there's also other software running on the same node that we'd not want to automatically switchover (e.g. because it takes longer to do or the functionality is more critical or less critical). Also we generally trust the HW, OS kernel and cluster middleware to behave correctly . These functions also follow the principle of fail-fast-and-safe. This trust might be an assumption that not everybody agrees with, though. So, if the failure originated from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a user-space problem - the default assumption is that isolation can be implemented on OS-level and that's a guarantee that the clusterware gives (using a separate Quorum mechanism to avoid split-brain situations). [Example of user-space software failures] So, what kind of failures would cause a user-space switchover rather than node-level isolation? This gets a bit philosophical. If you assume that many software failures are caused by concurrency issues, switching over to the standby is actually a good strategy as it's unlikely that the same concurrency issue happens again on the standby. Another reason for software failures is entering exceptional situations, such as disk getting full, overload on the node (causes by some other process), backup being taken, upgrade conversion etc. So here the idea is that failover to a standby instance helps as long as there's some hope that on the standby side the situation is different. If we'd just have an internal Postgres restart in such situations, we'd have flapping db connectivity - without the operator even being aware of it (awareness about problem situations is also something that the cluster HA middleware takes care of). [Possible implementation options] I see only two solutions to allow an external cluster-HA-middleware to make recovery decisions: (1) postmaster process exits if it detects any unpredicted failure or (2) have postmaster provide an interface to notify about software failures (i.e. the case it goes into postmasterre-initializing). In case (2) it would be the cluster-HA-middleware that isolates the postmaster process, e.g. by SIGKILL-ing all related processes and forcefully releasing all shared resources that it uses. However, I favor case (1) as long as we would keep the logic that runs within the postmaster in case it detects a backend process failure as simple as possible - meaning force-stop all postgres processes (SIGKILL), wait for SIGCHLD from them and exit (should only take few milliseconds). [Question] So the question remains: Is this behavior and the most likely addition of a postgresql.conf ""automatic_restart_after_crash = on" something that completely goes against the Postgres philosopy or is this something that once implemented would be acceptable to have in the main Postgres code base? Thoralf
"Czichy, Thoralf (NSN - FI/Helsinki)" <thoralf.czichy@nsn.com> writes: > I am working together with Harald on this issue. Below some thoughts on > why we think it should be possible to disable the postmaster-internal > recovery attempt and instead have faults in the processes started > by postmaster escalated to postmaster-exit. I'll tell you what the fundamental problem with this is: it's converting Postgres into a piece of software that is completely dependent on some hypothetical outside management code in order to meet one of its basic design goals. That isn't going to go over very well to start with. Until you have written such management code, made it freely available, and demonstrated that this type of recovery approach is *actually* not hypothetically useful in a real-world environment, it's unlikely that anyone is going to want to consider it. I'd recommend just carrying a private patch to make Postgres do what you want ... it's unlikely to be the only such patch you need anyway. One obvious example is that nothing you describe is sensible without exposing more information than "something failed" to the outside management code. You'll want some kind of API in there to pass on whatever the postmaster knows to the outside code. We might consider adopting a set of patches like that once it's been demonstrated to be useful for a live project, but I don't think we'll accept it on speculation. regards, tom lane
Hi, On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN - FI/Helsinki)<thoralf.czichy@nsn.com> wrote: > [STONITH is not always best strategy if failures can be declared as > user-space software problem only, limit STONITH to HW/OS failures] > > The isolation of the failing Postgres instance does not require a > STONITH > - mainly as there's also other software running on the same node that > we'd > not want to automatically switchover (e.g. because it takes longer to do > or > the functionality is more critical or less critical). Also we generally > trust > the HW, OS kernel and cluster middleware to behave correctly . These > functions > also follow the principle of fail-fast-and-safe. This trust might be an > assumption that not everybody agrees with, though. So, if the failure > originated > from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a > user-space problem - the default assumption is that isolation can be > implemented on > OS-level and that's a guarantee that the clusterware gives (using a > separate > Quorum mechanism to avoid split-brain situations). HW-level STONITH seems to be too much for your case. How about making your HA-middleware shut the dying postgres down before doing switchover by using (for example) "pg_ctl -mi stop"? In this case, other softwares can still keep on running on the original node after switchover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center