Thread: High Availability with Postgres

High Availability with Postgres

From
Elior Soliman
Date:
Hello,

My company looking for some solution for High availability with Postgres.

Our optional solution is as follows :
Two DB servers will be using a common external storage (with raid). Both servers are going to use the same DB files on the storage (as active/passive)

Now I'm trying to understand how Postgres can work with this configuration. I.e :

DB_server1 crashed, so we want to start DB_server2 using same files.
Is it possible ?

Regards,
Elior

Re: High Availability with Postgres

From
David Fetter
Date:
On Sun, Jun 20, 2010 at 07:34:10PM +0300, Elior Soliman wrote:
> Hello,
>
> My company looking for some solution for High availability with Postgres.
>
> Our optional solution is as follows :
> Two DB servers will be using a common external storage (with raid).

Stop right there.  This is the Oracle way of doing things, and it
doesn't work for PostgreSQL.

> Both servers are going to use the same DB files on the storage (as
> active/passive)
>
> Now I'm trying to understand how Postgres can work with this
> configuration.  I.e :

It does not.

There are plenty of ways to get that broad spectrum of sometimes
contradictory things people mean when they use the phrase "HA" with
PostgreSQL, but you must first define your requirements.  Once you
have done so, it will be possible to create strategies for achieving
same.

What are the actual requirements?  Things that would be nice to have?
What are your priorities for both?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: High Availability with Postgres

From
Craig Ringer
Date:
On 21/06/10 00:34, Elior Soliman wrote:
> Hello,
>
> My company looking for some solution for High availability with Postgres.
>
> Our optional solution is as follows :
> Two DB servers will be using a common external storage (with raid). Both
> servers are going to use the same DB files on the storage (as
> active/passive)

Why do you want that configuration?

For PostgreSQL, a warm standby setup using WAL-based replication (PITR,
or "point in time replication") is usually the preferred approach.

There are also add-on replication solutions like bucardo and slony-I.

> Now I'm trying to understand how Postgres can work with thi
> configuration. I.e :
>
> DB_server1 crashed, so we want to start DB_server2 using same files.
> Is it possible ?

Depending on why and how db_server1 crashed, it may be. If the DB server
crashed in such a way as that it didn't stomp all over the shared
storage, then you can unplug db_server1 from the shared storage or
otherwise render that shared storage completely inaccessible by
db_server1, then start the database on db_server2.

If you fail to render it inaccessible by db_server1 or make sure
db_server1 is stone dead (powered off) first, you may land up with two
postmasters trying to work with the same data. This *will* cause
corruption if you override/bypass Pg's attempts to detect and prevent
this fault condition.

The only time you can really use shared storage is if you have a
heartbeat setup with STONITH. Even then, I'd personally prefer to use
network replication if at all possible, since it removes the shared
storage as a central point of failure. It also lets you use much, much
faster locally attached storage where disks are dedicated to PostgreSQL.

--
Craig Ringer

Re: High Availability with Postgres

From
John R Pierce
Date:
On 06/20/10 10:36 AM, David Fetter wrote:
> On Sun, Jun 20, 2010 at 07:34:10PM +0300, Elior Soliman wrote:
>
>> My company looking for some solution for High availability with Postgres.
>>
>> Our optional solution is as follows :
>> Two DB servers will be using a common external storage (with raid).
>>
> Stop right there.  This is the Oracle way of doing things, and it
> doesn't work for PostgreSQL.
>

Sure it does, as long as the database filesystem is only mounted on the
currently active server, and only that instance of postgres is running.
This is the traditional HA 'active/standby' server configuration.  Note,
I'm *not* talking about Oracle RAC active-active clustering.

This is also one of the only postgres HA configurations that won't lose
/any/ committed transactions on a failure.  Most all PITR/WAL
replication/Slony/etc configs, the standby storage runs several seconds
behind realtime.

Use a cluster manager, like Linux Heartbeat, Veritas Cluster, Sun
Cluster, etc, to manage the state transitions.


on a manual failover, the active server stops postgres, frees the shared
IP, umounts the shared storage, then the standby server fences the
formerly active server so it can't access the storage if it
accidentially tried, adopts the shared IP, mounts the shared storage,
starts postgres and is online.

on a failed server failover the standby server does the same thing.

the commercial cluster software vendors insist on using dedicated
connections for the heartbeat messages between the cluster members and
insist on having fencing capabilities (for instance, disabling the fiber
switch port of the formerly active server and enabling the port for the
to-be-activated server).  with linux-ha and heartbeat, you're on your own.

of course, a system like this, your external shared raid should itself
be redundant, and have controller failover abilities, and each cluster
server should have redundant connecctions to the two storage
controllers.  With fiberchannel you use two switches and two HBAs on
each node.  with iscsi, you'd use two ethernet switches and NICs on each
host.



Re: High Availability with Postgres

From
Raymond O'Donnell
Date:
On 20/06/2010 17:34, Elior Soliman wrote:
> Hello,
>
> My company looking for some solution for High availability with Postgres.

There's quite a bit of information in the documentation here:

  http://www.postgresql.org/docs/8.4/static/high-availability.html

HTH,

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: High Availability with Postgres

From
Yaroslav Tykhiy
Date:
Hi,

On 21/06/2010, at 3:37 AM, Raymond O'Donnell wrote:

> On 20/06/2010 17:34, Elior Soliman wrote:
>> Hello,
>>
>> My company looking for some solution for High availability with
>> Postgres.
>
> There's quite a bit of information in the documentation here:
>
>  http://www.postgresql.org/docs/8.4/static/high-availability.html

And to keep oneself up to speed:

http://momjian.us/main/blogs/pgblog/2010.html#June_16_2010

Yar

Re: High Availability with Postgres

From
Dimitri Fontaine
Date:
John R Pierce <pierce@hogranch.com> writes:
>>> Two DB servers will be using a common external storage (with raid).
>
> This is also one of the only postgres HA configurations that won't lose
> /any/ committed transactions on a failure.  Most all PITR/WAL
> replication/Slony/etc configs, the standby storage runs several seconds
> behind realtime.

I'm not clear on what error case it protects against, though. Either the
data is ok and a single PostgreSQL system will restart fine, or the data
isn't and you're hosed the same with or without the second system.

What's left is hardware failure that didn't compromise the data. I
didn't see much hardware failure yet, granted, but I'm yet to see a
motherboard, some RAM or a RAID controller failing in a way that leaves
behind data you can trust.

So my question would be, what case do you handle better with a shared
external storage compared to shared nothing servers with some sort of
replication (including WAL shipping)?

Regards,
--
dim

Re: High Availability with Postgres

From
John R Pierce
Date:
On 06/21/10 12:23 PM, Dimitri Fontaine wrote:
> John R Pierce<pierce@hogranch.com>  writes:
>
>>>> Two DB servers will be using a common external storage (with raid).
>>>>
>> This is also one of the only postgres HA configurations that won't lose
>> /any/ committed transactions on a failure.  Most all PITR/WAL
>> replication/Slony/etc configs, the standby storage runs several seconds
>> behind realtime.
>>
> I'm not clear on what error case it protects against, though. Either the
> data is ok and a single PostgreSQL system will restart fine, or the data
> isn't and you're hosed the same with or without the second system.
>
> What's left is hardware failure that didn't compromise the data. I
> didn't see much hardware failure yet, granted, but I'm yet to see a
> motherboard, some RAM or a RAID controller failing in a way that leaves
> behind data you can trust.
>

in most of the HA clusters I've seen, the raid controllers are in the
SAN, not in the hosts, and they have their own failover, with shared
write cache, also extensive use of ECC so things like double-bit memory
errors are detected and treated as a failure.   the sorts of high end
SANs used in these kinds of systems have 5-9's reliability, through
extensive use of redundancy, dual port disks, fully redundant
everything, mirrored caches, etc.

ditto, the servers used in these sorts of clusters have ECC memory, so
memory failure should be detected rather than passed on blindly in the
form of corrupted data.   Server grade CPUs, especially the RISC ones,
have extensive ECC internally on their caches, data busses, etc, so any
failure there is detected rather than allowed to corrupt data.  failure
modes can include things like failing fans (which will be detected,
resulting in a server shutdown if too many fail), power supply failure
(redundant PSUs, but I've seen the power combining circuitry fail).
Any of these sorts of failures will result in a failover without
corrupting the data.

and of course, intentional planned failovers to do OS maintenance...
you patch the standby system, fail over to it and verify its good, then
patch the other system.

We had a large HA system at an overseas site fail over once due to
flooding in the primary computer room caused by a sprinkler system
failure upstairs.   The SAN was mirrored to a SAN in the 2nd DC (fiber
inteconnected) and the backup server was also in the second DC across
campus, so it all failed over gracefully.   This particular system was
large Sun hardware and big EMC storage, and it was running Oracle rather
than Postgres.   We've had several big UPS failures at various sites,
too, ditto HVAC, over a 15 year period.



Re: High Availability with Postgres

From
Greg Smith
Date:
John R Pierce wrote:
> the commercial cluster software vendors insist on using dedicated
> connections for the heartbeat messages between the cluster members and
> insist on having fencing capabilities (for instance, disabling the
> fiber switch port of the formerly active server and enabling the port
> for the to-be-activated server).  with linux-ha and heartbeat, you're
> on your own.

This is worth highlighting.  As John points out, it's straighforward to
build a shared storage implementation using PostgreSQL and either one of
the commercial clustering systems or using Linux-HA.  And until
PostgreSQL gets fully synchronous replication, it's a viable alternate
solution for "must not lose a transaction" deployments when the storage
used is much more reliable than the nodes.

The hard part of shared storage failover is always solving the "shoot
the other node in the head problem", to keep a down node from coming
back once it's no longer the active one.  In order to do that well, you
really need to lock the now unavailable node from accessing the storage
at the hardware level--"fencing"--with disabling its storage port being
one way to handle that.  Figure out how you're going to do that reliably
in a way that's integrated into a proper cluster manager, and there's no
reason you can't do this with PostgreSQL.

There's a description of the fencing options for Linux-HA at
http://www.clusterlabs.org/doc/crm_fencing.html ; the cheap way to solve
this problem is to have a UPS that disables the power going to the shot
node.  Once that's done, you can then safely failover the shared storage
to another system.  At that point, you can probably even turn back on
the power, presuming that the now rebooted system will be able to regain
access to the storage during a fresh system start.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: High Availability with Postgres

From
John R Pierce
Date:
On 06/21/10 8:08 PM, Greg Smith wrote:
> The hard part of shared storage failover is always solving the "shoot
> the other node in the head problem", to keep a down node from coming
> back once it's no longer the active one.  In order to do that well,
> you really need to lock the now unavailable node from accessing the
> storage at the hardware level--"fencing"--with disabling its storage
> port being one way to handle that.  Figure out how you're going to do
> that reliably in a way that's integrated into a proper cluster
> manager, and there's no reason you can't do this with PostgreSQL.

In my dev-lab tests of some clusters, I used the QLogic 5600 FC switch
that connects my motly collection of servers...  I used RHCS for one
test, it supported the qlogic via telnet...   I created two zone sets in
the qlogic, one for each state, with the standby host blocked from
accessing the LUN, and the cluster manager used telnet to talk to the
switch.    I ran heartbeats over two seperate ethernets (one was the lab
LAN segment, the other was a private switch i have all the servers
connected to for various tests, and such).      The qlogic switch also
had another zoneset for all sorts of other servers and storage which
wasn't affected by these clustering tests.

I don't like power cycling servers, so I'd prefer not to use power
switch based fencing, although I believe my blade box's management unit
is supported as a power fencing device.



Re: High Availability with Postgres

From
Devrim GÜNDÜZ
Date:
On Mon, 2010-06-21 at 23:08 -0400, Greg Smith wrote:
>
> The hard part of shared storage failover is always solving the "shoot
> the other node in the head problem", to keep a down node from coming
> back once it's no longer the active one.  In order to do that well,
> you really need to lock the now unavailable node from accessing the
> storage at the hardware level--"fencing"--with disabling its storage
> port being one way to handle that.  Figure out how you're going to do
> that reliably in a way that's integrated into a proper cluster
> manager, and there's no reason you can't do this with PostgreSQL.

FWIW, I know a prod instances that has 4 PostgreSQL servers (on 4
different hardware, I mean) running on Red Hat Cluster Suite, and it has
been running more than 2 years w/o any issues. The only issues were
related to RHCS+HP hardware, but as of RHEL 5.5, all issues are
resolved.
--
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Attachment

Re: High Availability with Postgres

From
Dimitri Fontaine
Date:
John R Pierce <pierce@hogranch.com> writes:
> failure modes can
> include things like failing fans (which will be detected, resulting in a
> server shutdown if too many fail), power supply failure (redundant PSUs, but
> I've seen the power combining circuitry fail).   Any of these sorts of
> failures will result in a failover without corrupting the data.
>
> and of course, intentional planned failovers to do OS maintenance...  you
> patch the standby system, fail over to it and verify its good, then patch
> the other system.

Ah, I see the use case much better now, thank you. And I begin too see
how expensive reaching such a goal is, too. Going from "I can lose this
many transactions" to "No data lost, ever" is at that price, though.

Regards,
--
dim

Re: High Availability with Postgres

From
Greg Smith
Date:
John R Pierce wrote:
> I don't like power cycling servers, so I'd prefer not to use power
> switch based fencing, although I believe my blade box's management
> unit is supported as a power fencing device.

I consider power control fencing to be a secondary resort if you don't
have hardware where a storage switch fence can be used.  It can be a
useful implementation for those situations, and not all shared storage
is attached with a FC switch.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: High Availability with Postgres

From
John R Pierce
Date:
On 06/22/10 1:58 AM, Dimitri Fontaine wrote:
> John R Pierce<pierce@hogranch.com>  writes:
>
>> failure modes can
>> include things like failing fans (which will be detected, resulting in a
>> server shutdown if too many fail), power supply failure (redundant PSUs, but
>> I've seen the power combining circuitry fail).   Any of these sorts of
>> failures will result in a failover without corrupting the data.
>>
>> and of course, intentional planned failovers to do OS maintenance...  you
>> patch the standby system, fail over to it and verify its good, then patch
>> the other system.
>>
> Ah, I see the use case much better now, thank you. And I begin too see
> how expensive reaching such a goal is, too. Going from "I can lose this
> many transactions" to "No data lost, ever" is at that price, though.
>

yeah.  generally when money is involved in the transactions, you gotta
stick to the 'no committed data lost ever'.  there's plenty of other use
cases for that too.



Re: High Availability with Postgres

From
Dimitri Fontaine
Date:
John R Pierce <pierce@hogranch.com> writes:
> yeah.  generally when money is involved in the transactions, you gotta stick
> to the 'no committed data lost ever'.  there's plenty of other use cases for
> that too.

Well, it's a cost/benefit/risk evaluation you have to make. It'd be bad
news that the cost for covering your risk is more expensive that the
risk itself, meaning there's no benefit walking the extra mile.

Regards,
--
dim

Re: High Availability with Postgres

From
Craig Ringer
Date:
On 23/06/10 03:05, John R Pierce wrote:

> yeah.  generally when money is involved in the transactions, you gotta
> stick to the 'no committed data lost ever'.  there's plenty of other use
> cases for that too.

2PC is sometimes a reasonable alternative to shared-storage failover,
though. It can be a lot slower, but it lets you maintain the machines as
completely separate systems with no shared failure points.

--
Craig Ringer