Thread: High Availability PostgreSQL HOWTO article

High Availability PostgreSQL HOWTO article

From
"Eric Comeau"
Date:
Has anyone implemented the HA solution linked
<http://www.taygeta.com/ha-postgresql.html ) linked>  on the PostgreSQL
Technical Documentation Web site?

http://www.taygeta.com/ha-postgresql.html
<http://www.taygeta.com/ha-postgresql.html>

I have read through the article, but one concern I have is that I don't
see how this is any different than performing a "File system level
backup".

In the PostgreSQL 7.2 Administrator's Guide, under the Backup and
Restore Chapter (9) - they mention the restriction that the database
"must" be shut down in order to get a usable backup.

How is taking Rsync copies ever 5 or 10 minutes any different than
taking a file system backup without shutting the database down?

Eric Comeau



Re: High Availability PostgreSQL HOWTO article

From
Ragnar Kjørstad
Date:
On Fri, Aug 02, 2002 at 03:36:20PM -0400, Eric Comeau wrote:
> Has anyone implemented the HA solution linked
> <http://www.taygeta.com/ha-postgresql.html ) linked>  on the PostgreSQL
> Technical Documentation Web site?
>
> http://www.taygeta.com/ha-postgresql.html
> <http://www.taygeta.com/ha-postgresql.html>
>
> I have read through the article, but one concern I have is that I don't
> see how this is any different than performing a "File system level
> backup".

It's not different, and such a solution should not be recommended.

The procedure is broken, but there are a couple of ways to fix it:
1. Take a snapshot before running the rsync. This will ensure that
   the backup-server gets a copy of files that are "in sync" rather
   than a set of files copied at slightly different times. There was
   a long discussion about filesystem-backup, snapshots and simular
   issues on this list a few weeks ago - you might want to check it
   out.
2. Use shared storage rather than rsync. (IOW: a RAID connected
   to both servers at the same time).
3. Use rserv (the replication-service included in postgresql-contrib)
   instead of rsync.

Both 1 and 3 will fix the consistancy-problem, but the backup-server
would only be up-to-date right after you do the sync. So, you risk
loosing several minutes of data.

Shared storage would eliminate this totally. The downside is that since
you only have one storage-unit it may become a single point of failure.
This can be fixed by using a storage-system with redundant raid-controllers,
multiple fans and multiple power-supplies. You would also want to take
regular backups, just in case.

> How is taking Rsync copies ever 5 or 10 minutes any different than
> taking a file system backup without shutting the database down?

It's not.


--
Ragnar Kjørstad
Big Storage

Re: High Availability PostgreSQL HOWTO article

From
"Robert M. Meyer"
Date:
I'm not a big fan of the shared storage solution.  The added
complication of identifying who has the RAID in a failover situation can
cause unnecessary downtime.  Since disk systems are typically the
highest risk of failure items (IMHO), the single point of failure is
just too much for high availability.  Anything that happens to the
filestore such as someone whacking files will cause a catastrophic
failure.  We're using LifeKeeper on a Compaq 3500 server in an attempt
at high availability.  We have had situations where we've lost
connectivity due to a switch reboot and had the secondary system try to
take over the database RAID.  Since the actual primary is not dead, the
database is still running and bad things happen during the transfer.
Postgres isn't happy when it's data directory disappears out from under
it.

I would recommend a system that syncs the entire database periodically
(pg_dumpp/pg_restore?) and then creates a queued transaction path that
gets applied at distinct intervals with the capability to post the
transactions at the point of discovery of the failure of the primary
system.  This way, a transaction that damages the database (not
necessarily corruption but bad data) can be caught before it makes it
into the replica.  I haven't found anything in Postgres that allows me
to keep an SQL transaction log that I can pass over.

I've been thinking about the debug flags that allow me to log all of the
queries that pass through the system as a possible path to this.  It
wouldn't be hard to write a perl/sed/awk/whatever script to strip out
the 'DEBUG:' bits and removes the 'SELECT' queries, leaving only the
'UPDATE' and 'INSERT' queries.  This could be trapped by a process that
writes out the stream to a process on the replica server that would just
store them in a file.  In the event of a primary system failure, a
process would detect the loss of the primary server, force the current
transaction log into the database, turn off the receiver daemon, then
take whatever actions are necessary to become the primary.

Of course, I haven't worked out all of the details of this :-).  Does
this sound like a worthwhile endeavor?  What holes can anyone punch in
this?  I know that the pg_restore time is a period of time that failover
would be vulnerable.

Cheers!

Bob

On Tue, 2002-08-06 at 15:01, Ragnar Kjørstad wrote:
> On Fri, Aug 02, 2002 at 03:36:20PM -0400, Eric Comeau wrote:
> > Has anyone implemented the HA solution linked
> > <http://www.taygeta.com/ha-postgresql.html ) linked>  on the PostgreSQL
> > Technical Documentation Web site?
> >
> > http://www.taygeta.com/ha-postgresql.html
> > <http://www.taygeta.com/ha-postgresql.html>
> >
> > I have read through the article, but one concern I have is that I don't
> > see how this is any different than performing a "File system level
> > backup".
>
> It's not different, and such a solution should not be recommended.
>
> The procedure is broken, but there are a couple of ways to fix it:
> 1. Take a snapshot before running the rsync. This will ensure that
>    the backup-server gets a copy of files that are "in sync" rather
>    than a set of files copied at slightly different times. There was
>    a long discussion about filesystem-backup, snapshots and simular
>    issues on this list a few weeks ago - you might want to check it
>    out.
> 2. Use shared storage rather than rsync. (IOW: a RAID connected
>    to both servers at the same time).
> 3. Use rserv (the replication-service included in postgresql-contrib)
>    instead of rsync.
>
> Both 1 and 3 will fix the consistancy-problem, but the backup-server
> would only be up-to-date right after you do the sync. So, you risk
> loosing several minutes of data.
>
> Shared storage would eliminate this totally. The downside is that since
> you only have one storage-unit it may become a single point of failure.
> This can be fixed by using a storage-system with redundant raid-controllers,
> multiple fans and multiple power-supplies. You would also want to take
> regular backups, just in case.
>
> > How is taking Rsync copies ever 5 or 10 minutes any different than
> > taking a file system backup without shutting the database down?
>
> It's not.
>
>
> --
> Ragnar Kjørstad
> Big Storage
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Robert M. Meyer
Sr. Network Administrator
DigiVision Satellite Services
14 Lafayette Sq, Ste 410
Buffalo, NY 14203-1904
(716)332-1451


Re: High Availability PostgreSQL HOWTO article

From
Andrew Sullivan
Date:
On Tue, Aug 06, 2002 at 03:42:19PM -0400, Robert M. Meyer wrote:

> I've been thinking about the debug flags that allow me to log all of the
> queries that pass through the system as a possible path to this.  It
> wouldn't be hard to write a perl/sed/awk/whatever script to strip out
> the 'DEBUG:' bits and removes the 'SELECT' queries, leaving only the
> 'UPDATE' and 'INSERT' queries.  This could be trapped by a process that
> writes out the stream to a process on the replica server that would just
> store them in a file.  In the event of a primary system failure, a
> process would detect the loss of the primary server, force the current
> transaction log into the database, turn off the receiver daemon, then
> take whatever actions are necessary to become the primary.

Sounds like what you want is replication via a replay of the WAL.
Maybe take the work currently being done on PITR (over on -hackers)
and try to extend it?

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: High Availability PostgreSQL HOWTO article

From
Ragnar Kjørstad
Date:
On Tue, Aug 06, 2002 at 03:42:19PM -0400, Robert M. Meyer wrote:
> I'm not a big fan of the shared storage solution.  The added
> complication of identifying who has the RAID in a failover situation can
> cause unnecessary downtime.  Since disk systems are typically the
> highest risk of failure items (IMHO), the single point of failure is
> just too much for high availability.  Anything that happens to the
> filestore such as someone whacking files will cause a catastrophic
> failure.  We're using LifeKeeper on a Compaq 3500 server in an attempt
> at high availability.  We have had situations where we've lost
> connectivity due to a switch reboot and had the secondary system try to
> take over the database RAID.  Since the actual primary is not dead, the
> database is still running and bad things happen during the transfer.
> Postgres isn't happy when it's data directory disappears out from under
> it.

The disk-system hardware shouldn't be a single point of failure, as
everything can be made with redundancy. The trashing that you describe
is supposed to be avoided by a fencing-mechanism in the
cluster-software. The most common fencing technique is something called
stonith (shoot the other node i the head) where a node must kill the
other one (typicly cut of power) before taking shared resources - to
avoid problems like this.

Lifekeeper uses a different technique; scsi reservations. Unfortenately
it doesn't work.

Also, the cluster-documentation (at least for heartbeat, but I think
it's the case for Lifekeeper as well) spesificly tells you that you need
multiple communication-channels between the nodes to avoid problems of
lost communication. One serial link + a dedicated ethernet crossover is
the most common way.

That said, I fully agree that clustering, and shared storage in
particular, adds complication. A buggy or incorrectly configured
cluster is likely to give more downtime than a single server.

I strongly feel that online documentation that recommends ha-software
(like the one starting this discussion) should emphasise on this, and
recommend that users that are not familiar with the software should get
professional services rather than try to follow a recipi.

What shared storage clusters will not protect you against is things like
filesystem or database corruption. So there is no substitute for
backups.



> I would recommend a system that syncs the entire database periodically
> (pg_dumpp/pg_restore?) and then creates a queued transaction path that
> gets applied at distinct intervals with the capability to post the
> transactions at the point of discovery of the failure of the primary
> system.  This way, a transaction that damages the database (not
> necessarily corruption but bad data) can be caught before it makes it
> into the replica.  I haven't found anything in Postgres that allows me
> to keep an SQL transaction log that I can pass over.

Check out rserv. It's somewhat simular to what you want.

> I've been thinking about the debug flags that allow me to log all of the
> queries that pass through the system as a possible path to this.  It
> wouldn't be hard to write a perl/sed/awk/whatever script to strip out
> the 'DEBUG:' bits and removes the 'SELECT' queries, leaving only the
> 'UPDATE' and 'INSERT' queries.  This could be trapped by a process that
> writes out the stream to a process on the replica server that would just
> store them in a file.  In the event of a primary system failure, a
> process would detect the loss of the primary server, force the current
> transaction log into the database, turn off the receiver daemon, then
> take whatever actions are necessary to become the primary.

Once the primary is dead you will no longer be able to get a copy of the
transaction-log from it. That means that there will be a window, however
small, of lost data. If your application can live happily with that,
then replicated storage may be your solution. If it can't live with
that, shared storage is the only answer. (short of replication with
2-phase commits inside the database-server)

> Of course, I haven't worked out all of the details of this :-).  Does
> this sound like a worthwhile endeavor?  What holes can anyone punch in
> this?  I know that the pg_restore time is a period of time that failover
> would be vulnerable.

I think it should work well if you can live with lost data in case of
failure. There are some details about how you do the syncs you should
look at to get them as fast as possible, and to make the windows of
potential lost data as short as possible. Look into things like
snapshots, rserv and maybe duplicating the db on the secondary before
updating it - so you have an older data-set to go back to if the primary
dies while you're syncing.

This is assuming your applications do far more reads than writes -
because if it's write intensive it get's much harder.

Also, be aware of all the complications. Even without shared storage the
clustering adds a dozen new ways your database can be corrupted! Think
for instance if you loose communications, and both servers think they're
primary. Then you have transactions going on on both servers - how are
you going to merge them afterwards? Basicly you can't, so you have to
avoid this situation from happening in the first place. (read: fencing)

If you truely want high availability, I would recommend you to get
professional services from either SuSE or IBM - they both have great
engineers on the heartbeat development team. We also offer supported
high-availability solutions, but only for storage - so for something as
special as databases you're probably better off with someone else.


--
Ragnar Kjørstad
Big Storage