warm standby replication safe failover - Mailing list pgsql-novice

From Andrew Puschak
Subject warm standby replication safe failover
Date
Msg-id CALFZoBvzPu=gt-j2mmnLJrWvA8+xndXwk2LE2bn2ApK9CDn02A@mail.gmail.com
Whole thread Raw
Responses Re: warm standby replication safe failover
List pgsql-novice
Hi!

I'm new here and have a question. I'm working on a replacement PostgreSQL 8.4 setup for a small company. They now have a Primary postgres server and 2 secondaries which are replicated using Slony-I, however the consulting company that set it up is afraid to failover and keeping Slony running is taking lots of resources.

The server processes IVR phone calls so it must be quick and have high availability.

I'm setting up a replacement group of physical servers that we can migrate the data over to piece by piece. Since it is a small company, they've agreed that Slony has been too complicated and that in an emergency failover 1 to 5 minutes of lost data is acceptable and quickly having another server available is more important. I can setup 3 separate servers at 3 separate physical locations parallel to the current production servers.

I'm testing a setup with a primary server, which has archiving turned on and rsync's over ssh the WAL log files to a secondary server. The secondary server is running pg_standby and waiting for a trigger. A third server will just accept nightly dumps of the primary to keep load down on the primary using a cronjob. The third server will be up to 1 day old with the dumps, but ready to go in case of primary and secondary failure as well as 1 day historical dumps in case we need any old data.

My question is how to safely failover between Primary and Secondary servers in this warm standby setup without loosing any data? I'd like to switch back and forth periodically to ensure we can trigger a failover in an emergency. Using the trigger file on the Secondary, we could loose up to 5 minutes of data since the WAL files are only rotated every 5 minutes, even though archive_timeout = 60. I saw a blog that mentioned copying the pg_xlog files from the Primary to the Secondary in addition to pg_standby, but I'm unsure if this will make sure we have 100% of our data. I tried this, put data in at each step and recovered all the data, but I'm unsure if the process ensures that no data is lost.

Test setup: Primary was shipping logs to Secondary and secondary was running pg_standby. I stop shipping logs by setting archive_command = '' and reloading postgres on Primary. Then I create a table with data that doesn't make it to the Secondary. On Primary I stopped postgres. Secondary was still waiting for the next log, never receiving the new table. I rsync'ed just the pg_xlog files using "pg_xlog/0*" to the Secondary pg_xlog directory (not the WAL archive directory pg_standby will recover from). These are the latest WAL log files and "000000010000000000000003.00000020.backup" (should I exclude backup file?). Then I triggered the recover. The new table is there and accessible.

My guess is that the logs in the WAL archive directory have been applied to the database and then when the database starts up it also has the last logs in the pg_xlog directory so it applies those last ones as well and we don't loose any data. Can someone confirm or deny this or a better solution to any of the above?

Thank you so much for your help,
Andrew


--
Andrew Puschak
(267) 614 - 2373
apuschak@gmail.com
http://andrewpuschak.com

pgsql-novice by date:

Previous
From: anderson earley
Date:
Subject: hello!!!
Next
From: Tom Lane
Date:
Subject: Re: warm standby replication safe failover