Thread: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
Sorry if this is a FAQ, I did search and couldn't find much. I need to make my Postgresql installation fault tolerant. I was imagining a RAIDed disk array that is accessible from two (or multiple) computers, with a postmaster running on each computer. (Hardware upgrades could then be done to each computer at different times without losing access to the database). Is this possible? Is there another way to do this I should be looking at? Thanks, j
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/10/07 20:43, John Gateley wrote: > Sorry if this is a FAQ, I did search and couldn't find much. > > I need to make my Postgresql installation fault tolerant. > I was imagining a RAIDed disk array that is accessible from two > (or multiple) computers, with a postmaster running on each computer. > (Hardware upgrades could then be done to each computer at different > times without losing access to the database). > > Is this possible? > > Is there another way to do this I should be looking at? PostgreSQL does not have a Distributed Lock Manager, so the two postmasters could not coordinate locking and updating. *Maybe* it would work if you put your data on to of OCFS2 filesystems, but I doubt it. Of course, you could always run OpenVMS. You can get *big*, used Alphas for a song. The yearly software licensing fees would be pretty steep, though. http://en.wikipedia.org/wiki/VMScluster http://en.wikipedia.org/wiki/Distributed_lock_manager - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGQ9u5S9HxQb37XmcRAhyyAKCWghW9kN+yttTndbRmvvTJY9n0vQCfdt60 C/oVMevsTtMt6SGCBSWZHAU= =hesp -----END PGP SIGNATURE-----
In response to John Gateley <gateley@jriver.com>: > Sorry if this is a FAQ, I did search and couldn't find much. > > I need to make my Postgresql installation fault tolerant. > I was imagining a RAIDed disk array that is accessible from two > (or multiple) computers, with a postmaster running on each computer. > (Hardware upgrades could then be done to each computer at different > times without losing access to the database). > > Is this possible? PGCluster II does this. I don't know if it's out of beta yet. -- Bill Moran http://www.potentialtech.com
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Devrim GÜNDÜZ
Date:
Hi, On Fri, 2007-05-11 at 06:24 -0400, Bill Moran wrote: > PGCluster II does this. I don't know if it's out of beta yet. Mitani is injured (left thumb) and he has been out of touch since 2 months. Last time we talked (1 month before) he said that he would continue working on PGCluster-II after he feels better -- but no up2date news until then. AFAIK, PGCluster-II is ready for testing, but SRA Europe guys will be doing an internal test before making the code public. He will be talking at PGCon, so we may expect to see some piece of code by the end of this month. Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
Attachment
John Gateley wrote: > Sorry if this is a FAQ, I did search and couldn't find much. > > I need to make my Postgresql installation fault tolerant. > I was imagining a RAIDed disk array that is accessible from two > (or multiple) computers, with a postmaster running on each computer. > (Hardware upgrades could then be done to each computer at different > times without losing access to the database). We are doing this, more or less. We use the RH cluster suite on two machines that share a common data silo. Basically, if one machine fails, the other fires up a postmaster and picks up where the other left off. That's real simple description because we actually have an active/active configuration with multiple postmasters running on each machine. Machine A is the active machine for databases 1-3 and machine B is the active machine for databases 4-6. If machine A fails, postmasters are fired up on machine B to attend to databases 1-3. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/11/07 07:32, Geoffrey wrote: > John Gateley wrote: >> Sorry if this is a FAQ, I did search and couldn't find much. >> >> I need to make my Postgresql installation fault tolerant. >> I was imagining a RAIDed disk array that is accessible from two >> (or multiple) computers, with a postmaster running on each computer. >> (Hardware upgrades could then be done to each computer at different >> times without losing access to the database). > > We are doing this, more or less. We use the RH cluster suite on two > machines that share a common data silo. Basically, if one machine > fails, the other fires up a postmaster and picks up where the other left > off. > > That's real simple description because we actually have an active/active > configuration with multiple postmasters running on each machine. Machine > A is the active machine for databases 1-3 and machine B is the active > machine for databases 4-6. If machine A fails, postmasters are fired > up on machine B to attend to databases 1-3. That's still not a cluster in the traditional sense. On a cluster-aware OS and RDBMS (like Rdb/VMS and Oracle RAC, which imperfectly got it's technology from VMS), all the databases would be open on both nodes and they would share locking over a (usually dedicated, and used-to-be-proprietary) network link. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGRG2rS9HxQb37XmcRAjRYAJ9UB4nvoFAbvWPBt70eY5kGuhL45ACgnnJE IuC72gtrsS/+aaWphZzU3QQ= =lHlt -----END PGP SIGNATURE-----
Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/11/07 07:32, Geoffrey wrote: >> John Gateley wrote: >>> Sorry if this is a FAQ, I did search and couldn't find much. >>> >>> I need to make my Postgresql installation fault tolerant. >>> I was imagining a RAIDed disk array that is accessible from two >>> (or multiple) computers, with a postmaster running on each computer. >>> (Hardware upgrades could then be done to each computer at different >>> times without losing access to the database). >> We are doing this, more or less. We use the RH cluster suite on two >> machines that share a common data silo. Basically, if one machine >> fails, the other fires up a postmaster and picks up where the other left >> off. >> >> That's real simple description because we actually have an active/active >> configuration with multiple postmasters running on each machine. Machine >> A is the active machine for databases 1-3 and machine B is the active >> machine for databases 4-6. If machine A fails, postmasters are fired >> up on machine B to attend to databases 1-3. > > That's still not a cluster in the traditional sense. > > On a cluster-aware OS and RDBMS (like Rdb/VMS and Oracle RAC, which > imperfectly got it's technology from VMS), all the databases would > be open on both nodes and they would share locking over a (usually > dedicated, and used-to-be-proprietary) network link. Regardless of what you want to call it, it certainly seems to reflect a solution the user might consider. I don't believe I called it a cluster. I stated we were using software called the 'cluster suite.' -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/11/07 08:31, Geoffrey wrote: > Ron Johnson wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 05/11/07 07:32, Geoffrey wrote: >>> John Gateley wrote: >>>> Sorry if this is a FAQ, I did search and couldn't find much. >>>> >>>> I need to make my Postgresql installation fault tolerant. >>>> I was imagining a RAIDed disk array that is accessible from two >>>> (or multiple) computers, with a postmaster running on each computer. >>>> (Hardware upgrades could then be done to each computer at different >>>> times without losing access to the database). >>> We are doing this, more or less. We use the RH cluster suite on two >>> machines that share a common data silo. Basically, if one machine >>> fails, the other fires up a postmaster and picks up where the other left >>> off. >>> >>> That's real simple description because we actually have an active/active >>> configuration with multiple postmasters running on each machine. Machine >>> A is the active machine for databases 1-3 and machine B is the active >>> machine for databases 4-6. If machine A fails, postmasters are fired >>> up on machine B to attend to databases 1-3. >> >> That's still not a cluster in the traditional sense. >> >> On a cluster-aware OS and RDBMS (like Rdb/VMS and Oracle RAC, which >> imperfectly got it's technology from VMS), all the databases would >> be open on both nodes and they would share locking over a (usually >> dedicated, and used-to-be-proprietary) network link. > > Regardless of what you want to call it, it certainly seems to reflect a > solution the user might consider. I don't believe I called it a > cluster. I stated we were using software called the 'cluster suite.' Call me elitist, but I've used OpenVMS for so long that if it's not a VMS-style shared-disk cluster, it's a false usage of the word. Compute-clusters excluded, of course. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGRHbXS9HxQb37XmcRAg04AKC5btWR3CVebNM2HbMQG+6IeiSZqQCfRMst RkulQKSefuR04O6D/3xlbaY= =7cNv -----END PGP SIGNATURE-----
Ron Johnson wrote: > Call me elitist, but I've used OpenVMS for so long that if it's not > a VMS-style shared-disk cluster, it's a false usage of the word. Okay, you're an elitist... > Compute-clusters excluded, of course. > > - -- > Ron Johnson, Jr. > Jefferson LA USA -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
"Joshua D. Drake"
Date:
Geoffrey wrote: > Ron Johnson wrote: > >> Call me elitist, but I've used OpenVMS for so long that if it's not >> a VMS-style shared-disk cluster, it's a false usage of the word. > > Okay, you're an elitist... People still use OpenVMS? ... elitist isn't the word I would choose ;) Sincerely, Joshua D. Drake > >> Compute-clusters excluded, of course. >> >> - -- >> Ron Johnson, Jr. >> Jefferson LA USA > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Geoffrey wrote: >> Ron Johnson wrote: >> >>> Call me elitist, but I've used OpenVMS for so long that if it's not >>> a VMS-style shared-disk cluster, it's a false usage of the word. >> >> Okay, you're an elitist... > > People still use OpenVMS? ... elitist isn't the word I would choose ;) ): -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Bruno Wolff III
Date:
On Thu, May 10, 2007 at 20:43:20 -0500, John Gateley <gateley@jriver.com> wrote: > Sorry if this is a FAQ, I did search and couldn't find much. > > I need to make my Postgresql installation fault tolerant. > I was imagining a RAIDed disk array that is accessible from two > (or multiple) computers, with a postmaster running on each computer. > (Hardware upgrades could then be done to each computer at different > times without losing access to the database). > > Is this possible? You can't have two postmasters accessing the same data. Doing so will cause corruption. You can have a failover system where another postmaster starts after the normal one has stopped. But you need to be completely sure the normal postmaster has stopped before starting the backup one. > Is there another way to do this I should be looking at? Depending on your needs replication might be useful.
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Paul Lambert
Date:
Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/11/07 08:31, Geoffrey wrote: > > Call me elitist, but I've used OpenVMS for so long that if it's not > a VMS-style shared-disk cluster, it's a false usage of the word. > > Compute-clusters excluded, of course. > Hear here! (I guess I'm elitist too) :) > - -- > Ron Johnson, Jr. > Jefferson LA USA > > Give a man a fish, and he eats for a day. > Hit him with a fish, and he goes away for good! > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFGRHbXS9HxQb37XmcRAg04AKC5btWR3CVebNM2HbMQG+6IeiSZqQCfRMst > RkulQKSefuR04O6D/3xlbaY= > =7cNv > -----END PGP SIGNATURE----- > -- Paul Lambert Database Administrator AutoLedgers
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/11/07 12:08, Joshua D. Drake wrote: > Geoffrey wrote: >> Ron Johnson wrote: >> >>> Call me elitist, but I've used OpenVMS for so long that if it's not >>> a VMS-style shared-disk cluster, it's a false usage of the word. >> >> Okay, you're an elitist... > > People still use OpenVMS? ... Sure. We pump 6 million INSERT statements per day thru some of our big OLTP systems. > elitist isn't the word I would choose ;) Dinosaurist? The big systems we use were last upgraded 5ish years ago, and are scheduled (eventually) to be replaced with Oracle on Linux. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGRPofS9HxQb37XmcRAov1AKDZibBDHq0SmV2fzuN+Mj6uPFcCYwCfUmPr fk3eTqpMNs4YasiYzMNdJjE= =XMU0 -----END PGP SIGNATURE-----
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Paul Lambert
Date:
Ron Johnson wrote: > > Dinosaurist? > > The big systems we use were last upgraded 5ish years ago, and are > scheduled (eventually) to be replaced with Oracle on Linux. > We've got some pretty new Alpha servers (around a year old) running VMS 8.3 which was released about the same time we got the servers...or shortly before. Sure it's been around nearly since the dawn of time, but it's still an actively developed operating system. I've finally got my Alpha server at home up and running now too, and I hope to be getting PG running on it as part of my thesis project when I start that in the near future, if my schedule allows. -- Paul Lambert Database Administrator AutoLedgers
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
"Joris Dobbelsteen"
Date:
>-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bruno >Wolff III >Sent: vrijdag 11 mei 2007 21:18 >To: John Gateley >Cc: pgsql-general@postgresql.org >Subject: Re: [GENERAL] Fault Tolerant Postgresql (two >machines, two postmasters, one disk array) > >On Thu, May 10, 2007 at 20:43:20 -0500, > John Gateley <gateley@jriver.com> wrote: >> Sorry if this is a FAQ, I did search and couldn't find much. >> >> I need to make my Postgresql installation fault tolerant. >> I was imagining a RAIDed disk array that is accessible from two (or >> multiple) computers, with a postmaster running on each computer. >> (Hardware upgrades could then be done to each computer at different >> times without losing access to the database). >> >> Is this possible? > >You can't have two postmasters accessing the same data. Doing >so will cause corruption. You can have a failover system where >another postmaster starts after the normal one has stopped. >But you need to be completely sure the normal postmaster has >stopped before starting the backup one. For this you might use heartbeat. See http://www.linux-ha.org/ They seem to have a good tool to do the job. In general, version 1, though limited to 2 nodes only is in use for several years and is will supported with most linux distributions. Also a lot of information is available on how to set up and getting it to work as desired. The newer version 2 might provide more features than you actually need. And since its newer there is fewer use. I believe heartbeat is also one of the elements in redhats cluster suite. - Joris Dobbelsteen
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/12/07 01:51, Paul Lambert wrote: > Ron Johnson wrote: >> >> Dinosaurist? >> >> The big systems we use were last upgraded 5ish years ago, and are >> scheduled (eventually) to be replaced with Oracle on Linux. >> > > We've got some pretty new Alpha servers (around a year old) running VMS > 8.3 which was released about the same time we got the servers...or > shortly before. We're pushing to get a big GS320 and a 5TB SAN to consolidate a couple of the systems that ran out of capacity a couple of years ago. The h/w is owned by a government agency, though, so we're at their mercy regarding capital expenditures. > Sure it's been around nearly since the dawn of time, but it's still an > actively developed operating system. > > I've finally got my Alpha server at home up and running now too, and I What are you running? > hope to be getting PG running on it as part of my thesis project when I > start that in the near future, if my schedule allows. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGRZUGS9HxQb37XmcRAuEoAJ4nxf3fVFjnmdN77Tubk6JdMnN5pQCdF9jD skMKKeYbeO0vDRcE+vjAmC0= =EnbC -----END PGP SIGNATURE-----
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Paul Lambert
Date:
Ron Johnson wrote: > On 05/12/07 01:51, Paul Lambert wrote: >> Sure it's been around nearly since the dawn of time, but it's still an >> actively developed operating system. >> >> I've finally got my Alpha server at home up and running now too, and I > > What are you running? > Off hand I couldn't tell you - It's a Compaq Alphastation model - so hardware wise my home server is a few years old, it's got 2*18Gb SCSI disks and a 555MHz processor if memory serves me correct with a gig of ram. Currently running OpenVMS 7.3-2 but I'll be upgrading to 8.2 or 8.3 shortly. I can get more accurate specs next time I'm home and can be bothered booting the machine up... it doesn't have much more than what I've already listed though - CPU speed is the only thing I'm not 100% on. -- Paul Lambert Database Administrator AutoLedgers
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
John Gateley
Date:
Thanks very much to all who responded, the replies were very helpful. j On Thu, 10 May 2007 20:43:20 -0500 John Gateley <gateley@jriver.com> wrote: > I need to make my Postgresql installation fault tolerant. > ...
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Andrew Sullivan
Date:
On Mon, May 14, 2007 at 10:42:13AM -0500, John Gateley wrote: > Thanks very much to all who responded, the replies were very helpful. One thing I will mention, that seems not to have come out in a number of the replies: the details _really really_ count when you set up this sort of mutli-machine hot failover arrangement. The general idea is that you have two machines, and the "standby" machine notices when the "hot" machine disappears, and then mounts the disk on the standby and takes over for the (now failed) hot machine. The problems come when you get a false detection of machine failure. Consider a case, for instance, where the machine A gets overloaded, goes into swap madness, or has a billion runaway processes that cause it to stagger. In this case, A might not respond in time on the heartbeat monitor, and then the standby machine B thinks A has failed. But A doesn't know that, of course, because it is working as hard as it can just to stay up. Now, if B mounts the disk and starts the postmaster, but doesn't have a way to make _sure_ tha A is completely disconnected from the disk, then it's entirely possible A will flush buffers out to the still-mounted data area. Poof! Instant data corruption. People often dismiss these sorts of scenarios as unlikely, because of the timing issues involved. But you have to remember that, if you're building this kind of high-availability system, you've already built your individual servers to be very fault tolerant anyway. They have loads of extra capacity, ECC memory, multiple redundant data paths, RAID -- all the goodies. So you're talking about an already unlikely failure scenario. If you're going to the effort to get an "extra 9" of availability, then you have to think about not only how to ensure you get that availability, but the consequences of failure. In this case, the consequence of having two systems mount the same data area is extremely serious, and you have to be _absolutely sure_ that A is dead and disconnected from the disk when B mounts that disk. Anything else is just asking for your weekend to be ruined by a data recovery. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
"John D. Burger"
Date:
Andrew Sullivan wrote: > Now, if B mounts the disk and starts > the postmaster, but doesn't have a way to make _sure_ tha A is > completely disconnected from the disk, then it's entirely possible A > will flush buffers out to the still-mounted data area. Poof! > Instant data corruption. Shoot The Other Node In The Head: http://www.linux-ha.org/STONITH - John D. Burger MITRE
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Andrew Sullivan
Date:
On Thu, May 17, 2007 at 12:50:56PM -0400, John D. Burger wrote: > > Shoot The Other Node In The Head: > > http://www.linux-ha.org/STONITH Right. I have heard people tell me this works as advertised, but I've never used it. I can tell you that a certain large, well-known blue company has a similar product, that in my experience does _not_ always work as advertised. A -- Andrew Sullivan | ajs@crankycanuck.ca If they don't do anything, we don't need their acronym. --Josh Hamilton, on the US FEMA
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Hannes Dorbath
Date:
Andrew Sullivan wrote: > On Thu, May 17, 2007 at 12:50:56PM -0400, John D. Burger wrote: >> Shoot The Other Node In The Head: >> >> http://www.linux-ha.org/STONITH > > Right. I have heard people tell me this works as advertised, but > I've never used it. I can tell you that a certain large, well-known > blue company has a similar product, that in my experience does _not_ > always work as advertised. Sure a HA setup is nothing that is setup over a weekend. I've spend about 3-4 months with testing. Correct node and resource fencing is essential and any mistake will cause your boxes go split brain. But it's not that complex either, given a simple 2 node cluster. Just make sure there are enough completely redundant communication paths (we have 4) for heartbeat and never run without a correctly configured and tested STONITH device. In case the STONITH device is a power switch connected directly to the standby node, there is not so much that can fail. My experiences with linux-ha are good. -- Best regards, Hannes Dorbath
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/17/07 09:35, Andrew Sullivan wrote: [snip] > > The problems come when you get a false detection of machine failure. > Consider a case, for instance, where the machine A gets overloaded, > goes into swap madness, or has a billion runaway processes that cause > it to stagger. In this case, A might not respond in time on the > heartbeat monitor, and then the standby machine B thinks A has > failed. But A doesn't know that, of course, because it is working as > hard as it can just to stay up. Now, if B mounts the disk and starts > the postmaster, but doesn't have a way to make _sure_ tha A is > completely disconnected from the disk, then it's entirely possible A > will flush buffers out to the still-mounted data area. Poof! > Instant data corruption. Aren't there PCI heartbeat cards that are independent of the load on the host machine? - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGTMFPS9HxQb37XmcRAgY7AJ9rJqy0XP01ubb4HqZwBUcBHplmwQCeM5wj gXKTp80exZQhR9ZTbgq7Ejg= =7Rkx -----END PGP SIGNATURE-----
> Aren't there PCI heartbeat cards that are independent of the load on > the host machine? But, if the machine is fork-bombed, or drowning in swap, or generally slowly committing suicide, it's not shall we say "available" anymore, so you might want to finish it off...
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
Andrew Sullivan
Date:
On Thu, May 17, 2007 at 03:55:43PM -0500, Ron Johnson wrote: > Aren't there PCI heartbeat cards that are independent of the load on > the host machine? Yes, there is more than one way to do this. My main point is to emphasise that you have to pay attention to the details -- all of them. It's especially important not to trust the vendor to get it right, because even if they sell a database product themselves, they may get it wrong. Some failure modes are nearly impossible to emulate in the lab (how do you cause a brand new working board to start flaking out as though it has some intermittent problem?). So you have to make sure that the thing can't wreck your data _by design_, and not just empirically. This means you have to understand all the technical details of how the thing works in order to know whether it is safe. I'm sure we've all seen, more than once, things happen that the vendor assures cannot. What this really comes down to is risk analysis. If you add a complicated failover system to get to five nines, and it breaks, it might actually make your uptime numbers worse, because it takes so long to recover from breakage. (If failover doesn't work, do you have to restore from dumps? How big is your data? Did this outage just go from five minutes to four hours?) Also, if it is complicated enough, your sysadmins have a whole new class of loaded foot-gun to fire at 03:00. So whatever you do, don't let your management talk themselves into specifying this on Thursday and deploying on Monday. A -- Andrew Sullivan | ajs@crankycanuck.ca Information security isn't a technological problem. It's an economics problem. --Bruce Schneier
Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
From
"Joris Dobbelsteen"
Date:
>-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Ron Johnson >Sent: donderdag 17 mei 2007 22:56 >To: pgsql-general@postgresql.org >Subject: Re: [GENERAL] Fault Tolerant Postgresql (two >machines, two postmasters, one disk array) > >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >On 05/17/07 09:35, Andrew Sullivan wrote: >[snip] >> >> The problems come when you get a false detection of machine failure. >> Consider a case, for instance, where the machine A gets overloaded, >> goes into swap madness, or has a billion runaway processes >that cause >> it to stagger. In this case, A might not respond in time on the >> heartbeat monitor, and then the standby machine B thinks A >has failed. >> But A doesn't know that, of course, because it is working as hard as >> it can just to stay up. Now, if B mounts the disk and starts the >> postmaster, but doesn't have a way to make _sure_ tha A is >completely >> disconnected from the disk, then it's entirely possible A will flush >> buffers out to the still-mounted data area. Poof! >> Instant data corruption. > >Aren't there PCI heartbeat cards that are independent of the >load on the host machine? A solution commonly seen is to cut the power on the 'failed' machine just before a take-over is done. Solutions for that are available... Besides this, you don't want a separate PCI heartbeat card to see if your software happens to work. Same situation with a watchdog, you don't want the watchdog to 'reset' itself continuesly, as you loose the benefit of the watchdog. Generally your software should also check is postgresql is operation as expected: its not stopped or non-responsive. In these cases the system should fail over. The 'cut power' solution works. If you look for details how to set up, heartbeat (www.linux-ha.org) and search for stonith. They have lots and lots of very useful information about high availability solutions. Furthermore the package is used arround the world for these solutions by large companies and part of several other software packages. It supports linux and BSD... - Joris
On May 11, 12:08 pm, j...@commandprompt.com ("Joshua D. Drake") wrote: > Geoffrey wrote: > > People still use OpenVMS? ... elitist isn't the word I would choose ;) > Not only do they use it, new books get written about doing application development with it. It is still the only OS able to create a fault tolerant world-wide cluster with complete transaction management across all nodes. Not just database transactions, but file level and message queue all integrated with one transaction manager. Take a look at http://www.theminimumyouneedtoknow.com for information about "The Minimum You Need to Know to Be an OpenVMS Application Developer"