Re: High Availability PostgreSQL HOWTO article - Mailing list pgsql-admin
From | Robert M. Meyer |
---|---|
Subject | Re: High Availability PostgreSQL HOWTO article |
Date | |
Msg-id | 1028662940.16115.55.camel@skymaster Whole thread Raw |
In response to | Re: High Availability PostgreSQL HOWTO article (Ragnar Kjørstad <postgres@ragnark.vestdata.no>) |
Responses |
Re: High Availability PostgreSQL HOWTO article
Re: High Availability PostgreSQL HOWTO article |
List | pgsql-admin |
I'm not a big fan of the shared storage solution. The added complication of identifying who has the RAID in a failover situation can cause unnecessary downtime. Since disk systems are typically the highest risk of failure items (IMHO), the single point of failure is just too much for high availability. Anything that happens to the filestore such as someone whacking files will cause a catastrophic failure. We're using LifeKeeper on a Compaq 3500 server in an attempt at high availability. We have had situations where we've lost connectivity due to a switch reboot and had the secondary system try to take over the database RAID. Since the actual primary is not dead, the database is still running and bad things happen during the transfer. Postgres isn't happy when it's data directory disappears out from under it. I would recommend a system that syncs the entire database periodically (pg_dumpp/pg_restore?) and then creates a queued transaction path that gets applied at distinct intervals with the capability to post the transactions at the point of discovery of the failure of the primary system. This way, a transaction that damages the database (not necessarily corruption but bad data) can be caught before it makes it into the replica. I haven't found anything in Postgres that allows me to keep an SQL transaction log that I can pass over. I've been thinking about the debug flags that allow me to log all of the queries that pass through the system as a possible path to this. It wouldn't be hard to write a perl/sed/awk/whatever script to strip out the 'DEBUG:' bits and removes the 'SELECT' queries, leaving only the 'UPDATE' and 'INSERT' queries. This could be trapped by a process that writes out the stream to a process on the replica server that would just store them in a file. In the event of a primary system failure, a process would detect the loss of the primary server, force the current transaction log into the database, turn off the receiver daemon, then take whatever actions are necessary to become the primary. Of course, I haven't worked out all of the details of this :-). Does this sound like a worthwhile endeavor? What holes can anyone punch in this? I know that the pg_restore time is a period of time that failover would be vulnerable. Cheers! Bob On Tue, 2002-08-06 at 15:01, Ragnar Kjørstad wrote: > On Fri, Aug 02, 2002 at 03:36:20PM -0400, Eric Comeau wrote: > > Has anyone implemented the HA solution linked > > <http://www.taygeta.com/ha-postgresql.html ) linked> on the PostgreSQL > > Technical Documentation Web site? > > > > http://www.taygeta.com/ha-postgresql.html > > <http://www.taygeta.com/ha-postgresql.html> > > > > I have read through the article, but one concern I have is that I don't > > see how this is any different than performing a "File system level > > backup". > > It's not different, and such a solution should not be recommended. > > The procedure is broken, but there are a couple of ways to fix it: > 1. Take a snapshot before running the rsync. This will ensure that > the backup-server gets a copy of files that are "in sync" rather > than a set of files copied at slightly different times. There was > a long discussion about filesystem-backup, snapshots and simular > issues on this list a few weeks ago - you might want to check it > out. > 2. Use shared storage rather than rsync. (IOW: a RAID connected > to both servers at the same time). > 3. Use rserv (the replication-service included in postgresql-contrib) > instead of rsync. > > Both 1 and 3 will fix the consistancy-problem, but the backup-server > would only be up-to-date right after you do the sync. So, you risk > loosing several minutes of data. > > Shared storage would eliminate this totally. The downside is that since > you only have one storage-unit it may become a single point of failure. > This can be fixed by using a storage-system with redundant raid-controllers, > multiple fans and multiple power-supplies. You would also want to take > regular backups, just in case. > > > How is taking Rsync copies ever 5 or 10 minutes any different than > > taking a file system backup without shutting the database down? > > It's not. > > > -- > Ragnar Kjørstad > Big Storage > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Robert M. Meyer Sr. Network Administrator DigiVision Satellite Services 14 Lafayette Sq, Ste 410 Buffalo, NY 14203-1904 (716)332-1451
pgsql-admin by date: