Thread: High Availability with Postgres
Hello,
My company looking for some solution for High availability with Postgres.
Our optional solution is as follows :
Two DB servers will be using a common external storage (with raid). Both servers are going to use the same DB files on the storage (as active/passive)
Now I'm trying to understand how Postgres can work with this configuration. I.e :
DB_server1 crashed, so we want to start DB_server2 using same files.
Is it possible ?
Regards,
Elior
My company looking for some solution for High availability with Postgres.
Our optional solution is as follows :
Two DB servers will be using a common external storage (with raid). Both servers are going to use the same DB files on the storage (as active/passive)
Now I'm trying to understand how Postgres can work with this configuration. I.e :
DB_server1 crashed, so we want to start DB_server2 using same files.
Is it possible ?
Regards,
Elior
On Sun, Jun 20, 2010 at 07:34:10PM +0300, Elior Soliman wrote: > Hello, > > My company looking for some solution for High availability with Postgres. > > Our optional solution is as follows : > Two DB servers will be using a common external storage (with raid). Stop right there. This is the Oracle way of doing things, and it doesn't work for PostgreSQL. > Both servers are going to use the same DB files on the storage (as > active/passive) > > Now I'm trying to understand how Postgres can work with this > configuration. I.e : It does not. There are plenty of ways to get that broad spectrum of sometimes contradictory things people mean when they use the phrase "HA" with PostgreSQL, but you must first define your requirements. Once you have done so, it will be possible to create strategies for achieving same. What are the actual requirements? Things that would be nice to have? What are your priorities for both? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 21/06/10 00:34, Elior Soliman wrote: > Hello, > > My company looking for some solution for High availability with Postgres. > > Our optional solution is as follows : > Two DB servers will be using a common external storage (with raid). Both > servers are going to use the same DB files on the storage (as > active/passive) Why do you want that configuration? For PostgreSQL, a warm standby setup using WAL-based replication (PITR, or "point in time replication") is usually the preferred approach. There are also add-on replication solutions like bucardo and slony-I. > Now I'm trying to understand how Postgres can work with thi > configuration. I.e : > > DB_server1 crashed, so we want to start DB_server2 using same files. > Is it possible ? Depending on why and how db_server1 crashed, it may be. If the DB server crashed in such a way as that it didn't stomp all over the shared storage, then you can unplug db_server1 from the shared storage or otherwise render that shared storage completely inaccessible by db_server1, then start the database on db_server2. If you fail to render it inaccessible by db_server1 or make sure db_server1 is stone dead (powered off) first, you may land up with two postmasters trying to work with the same data. This *will* cause corruption if you override/bypass Pg's attempts to detect and prevent this fault condition. The only time you can really use shared storage is if you have a heartbeat setup with STONITH. Even then, I'd personally prefer to use network replication if at all possible, since it removes the shared storage as a central point of failure. It also lets you use much, much faster locally attached storage where disks are dedicated to PostgreSQL. -- Craig Ringer
On 06/20/10 10:36 AM, David Fetter wrote: > On Sun, Jun 20, 2010 at 07:34:10PM +0300, Elior Soliman wrote: > >> My company looking for some solution for High availability with Postgres. >> >> Our optional solution is as follows : >> Two DB servers will be using a common external storage (with raid). >> > Stop right there. This is the Oracle way of doing things, and it > doesn't work for PostgreSQL. > Sure it does, as long as the database filesystem is only mounted on the currently active server, and only that instance of postgres is running. This is the traditional HA 'active/standby' server configuration. Note, I'm *not* talking about Oracle RAC active-active clustering. This is also one of the only postgres HA configurations that won't lose /any/ committed transactions on a failure. Most all PITR/WAL replication/Slony/etc configs, the standby storage runs several seconds behind realtime. Use a cluster manager, like Linux Heartbeat, Veritas Cluster, Sun Cluster, etc, to manage the state transitions. on a manual failover, the active server stops postgres, frees the shared IP, umounts the shared storage, then the standby server fences the formerly active server so it can't access the storage if it accidentially tried, adopts the shared IP, mounts the shared storage, starts postgres and is online. on a failed server failover the standby server does the same thing. the commercial cluster software vendors insist on using dedicated connections for the heartbeat messages between the cluster members and insist on having fencing capabilities (for instance, disabling the fiber switch port of the formerly active server and enabling the port for the to-be-activated server). with linux-ha and heartbeat, you're on your own. of course, a system like this, your external shared raid should itself be redundant, and have controller failover abilities, and each cluster server should have redundant connecctions to the two storage controllers. With fiberchannel you use two switches and two HBAs on each node. with iscsi, you'd use two ethernet switches and NICs on each host.
On 20/06/2010 17:34, Elior Soliman wrote: > Hello, > > My company looking for some solution for High availability with Postgres. There's quite a bit of information in the documentation here: http://www.postgresql.org/docs/8.4/static/high-availability.html HTH, Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
Hi, On 21/06/2010, at 3:37 AM, Raymond O'Donnell wrote: > On 20/06/2010 17:34, Elior Soliman wrote: >> Hello, >> >> My company looking for some solution for High availability with >> Postgres. > > There's quite a bit of information in the documentation here: > > http://www.postgresql.org/docs/8.4/static/high-availability.html And to keep oneself up to speed: http://momjian.us/main/blogs/pgblog/2010.html#June_16_2010 Yar
John R Pierce <pierce@hogranch.com> writes: >>> Two DB servers will be using a common external storage (with raid). > > This is also one of the only postgres HA configurations that won't lose > /any/ committed transactions on a failure. Most all PITR/WAL > replication/Slony/etc configs, the standby storage runs several seconds > behind realtime. I'm not clear on what error case it protects against, though. Either the data is ok and a single PostgreSQL system will restart fine, or the data isn't and you're hosed the same with or without the second system. What's left is hardware failure that didn't compromise the data. I didn't see much hardware failure yet, granted, but I'm yet to see a motherboard, some RAM or a RAID controller failing in a way that leaves behind data you can trust. So my question would be, what case do you handle better with a shared external storage compared to shared nothing servers with some sort of replication (including WAL shipping)? Regards, -- dim
On 06/21/10 12:23 PM, Dimitri Fontaine wrote: > John R Pierce<pierce@hogranch.com> writes: > >>>> Two DB servers will be using a common external storage (with raid). >>>> >> This is also one of the only postgres HA configurations that won't lose >> /any/ committed transactions on a failure. Most all PITR/WAL >> replication/Slony/etc configs, the standby storage runs several seconds >> behind realtime. >> > I'm not clear on what error case it protects against, though. Either the > data is ok and a single PostgreSQL system will restart fine, or the data > isn't and you're hosed the same with or without the second system. > > What's left is hardware failure that didn't compromise the data. I > didn't see much hardware failure yet, granted, but I'm yet to see a > motherboard, some RAM or a RAID controller failing in a way that leaves > behind data you can trust. > in most of the HA clusters I've seen, the raid controllers are in the SAN, not in the hosts, and they have their own failover, with shared write cache, also extensive use of ECC so things like double-bit memory errors are detected and treated as a failure. the sorts of high end SANs used in these kinds of systems have 5-9's reliability, through extensive use of redundancy, dual port disks, fully redundant everything, mirrored caches, etc. ditto, the servers used in these sorts of clusters have ECC memory, so memory failure should be detected rather than passed on blindly in the form of corrupted data. Server grade CPUs, especially the RISC ones, have extensive ECC internally on their caches, data busses, etc, so any failure there is detected rather than allowed to corrupt data. failure modes can include things like failing fans (which will be detected, resulting in a server shutdown if too many fail), power supply failure (redundant PSUs, but I've seen the power combining circuitry fail). Any of these sorts of failures will result in a failover without corrupting the data. and of course, intentional planned failovers to do OS maintenance... you patch the standby system, fail over to it and verify its good, then patch the other system. We had a large HA system at an overseas site fail over once due to flooding in the primary computer room caused by a sprinkler system failure upstairs. The SAN was mirrored to a SAN in the 2nd DC (fiber inteconnected) and the backup server was also in the second DC across campus, so it all failed over gracefully. This particular system was large Sun hardware and big EMC storage, and it was running Oracle rather than Postgres. We've had several big UPS failures at various sites, too, ditto HVAC, over a 15 year period.
John R Pierce wrote: > the commercial cluster software vendors insist on using dedicated > connections for the heartbeat messages between the cluster members and > insist on having fencing capabilities (for instance, disabling the > fiber switch port of the formerly active server and enabling the port > for the to-be-activated server). with linux-ha and heartbeat, you're > on your own. This is worth highlighting. As John points out, it's straighforward to build a shared storage implementation using PostgreSQL and either one of the commercial clustering systems or using Linux-HA. And until PostgreSQL gets fully synchronous replication, it's a viable alternate solution for "must not lose a transaction" deployments when the storage used is much more reliable than the nodes. The hard part of shared storage failover is always solving the "shoot the other node in the head problem", to keep a down node from coming back once it's no longer the active one. In order to do that well, you really need to lock the now unavailable node from accessing the storage at the hardware level--"fencing"--with disabling its storage port being one way to handle that. Figure out how you're going to do that reliably in a way that's integrated into a proper cluster manager, and there's no reason you can't do this with PostgreSQL. There's a description of the fencing options for Linux-HA at http://www.clusterlabs.org/doc/crm_fencing.html ; the cheap way to solve this problem is to have a UPS that disables the power going to the shot node. Once that's done, you can then safely failover the shared storage to another system. At that point, you can probably even turn back on the power, presuming that the now rebooted system will be able to regain access to the storage during a fresh system start. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 06/21/10 8:08 PM, Greg Smith wrote: > The hard part of shared storage failover is always solving the "shoot > the other node in the head problem", to keep a down node from coming > back once it's no longer the active one. In order to do that well, > you really need to lock the now unavailable node from accessing the > storage at the hardware level--"fencing"--with disabling its storage > port being one way to handle that. Figure out how you're going to do > that reliably in a way that's integrated into a proper cluster > manager, and there's no reason you can't do this with PostgreSQL. In my dev-lab tests of some clusters, I used the QLogic 5600 FC switch that connects my motly collection of servers... I used RHCS for one test, it supported the qlogic via telnet... I created two zone sets in the qlogic, one for each state, with the standby host blocked from accessing the LUN, and the cluster manager used telnet to talk to the switch. I ran heartbeats over two seperate ethernets (one was the lab LAN segment, the other was a private switch i have all the servers connected to for various tests, and such). The qlogic switch also had another zoneset for all sorts of other servers and storage which wasn't affected by these clustering tests. I don't like power cycling servers, so I'd prefer not to use power switch based fencing, although I believe my blade box's management unit is supported as a power fencing device.
On Mon, 2010-06-21 at 23:08 -0400, Greg Smith wrote: > > The hard part of shared storage failover is always solving the "shoot > the other node in the head problem", to keep a down node from coming > back once it's no longer the active one. In order to do that well, > you really need to lock the now unavailable node from accessing the > storage at the hardware level--"fencing"--with disabling its storage > port being one way to handle that. Figure out how you're going to do > that reliably in a way that's integrated into a proper cluster > manager, and there's no reason you can't do this with PostgreSQL. FWIW, I know a prod instances that has 4 PostgreSQL servers (on 4 different hardware, I mean) running on Red Hat Cluster Suite, and it has been running more than 2 years w/o any issues. The only issues were related to RHCS+HP hardware, but as of RHEL 5.5, all issues are resolved. -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Attachment
John R Pierce <pierce@hogranch.com> writes: > failure modes can > include things like failing fans (which will be detected, resulting in a > server shutdown if too many fail), power supply failure (redundant PSUs, but > I've seen the power combining circuitry fail). Any of these sorts of > failures will result in a failover without corrupting the data. > > and of course, intentional planned failovers to do OS maintenance... you > patch the standby system, fail over to it and verify its good, then patch > the other system. Ah, I see the use case much better now, thank you. And I begin too see how expensive reaching such a goal is, too. Going from "I can lose this many transactions" to "No data lost, ever" is at that price, though. Regards, -- dim
John R Pierce wrote: > I don't like power cycling servers, so I'd prefer not to use power > switch based fencing, although I believe my blade box's management > unit is supported as a power fencing device. I consider power control fencing to be a secondary resort if you don't have hardware where a storage switch fence can be used. It can be a useful implementation for those situations, and not all shared storage is attached with a FC switch. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 06/22/10 1:58 AM, Dimitri Fontaine wrote: > John R Pierce<pierce@hogranch.com> writes: > >> failure modes can >> include things like failing fans (which will be detected, resulting in a >> server shutdown if too many fail), power supply failure (redundant PSUs, but >> I've seen the power combining circuitry fail). Any of these sorts of >> failures will result in a failover without corrupting the data. >> >> and of course, intentional planned failovers to do OS maintenance... you >> patch the standby system, fail over to it and verify its good, then patch >> the other system. >> > Ah, I see the use case much better now, thank you. And I begin too see > how expensive reaching such a goal is, too. Going from "I can lose this > many transactions" to "No data lost, ever" is at that price, though. > yeah. generally when money is involved in the transactions, you gotta stick to the 'no committed data lost ever'. there's plenty of other use cases for that too.
John R Pierce <pierce@hogranch.com> writes: > yeah. generally when money is involved in the transactions, you gotta stick > to the 'no committed data lost ever'. there's plenty of other use cases for > that too. Well, it's a cost/benefit/risk evaluation you have to make. It'd be bad news that the cost for covering your risk is more expensive that the risk itself, meaning there's no benefit walking the extra mile. Regards, -- dim
On 23/06/10 03:05, John R Pierce wrote: > yeah. generally when money is involved in the transactions, you gotta > stick to the 'no committed data lost ever'. there's plenty of other use > cases for that too. 2PC is sometimes a reasonable alternative to shared-storage failover, though. It can be a lot slower, but it lets you maintain the machines as completely separate systems with no shared failure points. -- Craig Ringer