Thread: warm standby replication safe failover

warm standby replication safe failover

From
Andrew Puschak
Date:
Hi!

I'm new here and have a question. I'm working on a replacement PostgreSQL 8.4 setup for a small company. They now have a Primary postgres server and 2 secondaries which are replicated using Slony-I, however the consulting company that set it up is afraid to failover and keeping Slony running is taking lots of resources.

The server processes IVR phone calls so it must be quick and have high availability.

I'm setting up a replacement group of physical servers that we can migrate the data over to piece by piece. Since it is a small company, they've agreed that Slony has been too complicated and that in an emergency failover 1 to 5 minutes of lost data is acceptable and quickly having another server available is more important. I can setup 3 separate servers at 3 separate physical locations parallel to the current production servers.

I'm testing a setup with a primary server, which has archiving turned on and rsync's over ssh the WAL log files to a secondary server. The secondary server is running pg_standby and waiting for a trigger. A third server will just accept nightly dumps of the primary to keep load down on the primary using a cronjob. The third server will be up to 1 day old with the dumps, but ready to go in case of primary and secondary failure as well as 1 day historical dumps in case we need any old data.

My question is how to safely failover between Primary and Secondary servers in this warm standby setup without loosing any data? I'd like to switch back and forth periodically to ensure we can trigger a failover in an emergency. Using the trigger file on the Secondary, we could loose up to 5 minutes of data since the WAL files are only rotated every 5 minutes, even though archive_timeout = 60. I saw a blog that mentioned copying the pg_xlog files from the Primary to the Secondary in addition to pg_standby, but I'm unsure if this will make sure we have 100% of our data. I tried this, put data in at each step and recovered all the data, but I'm unsure if the process ensures that no data is lost.

Test setup: Primary was shipping logs to Secondary and secondary was running pg_standby. I stop shipping logs by setting archive_command = '' and reloading postgres on Primary. Then I create a table with data that doesn't make it to the Secondary. On Primary I stopped postgres. Secondary was still waiting for the next log, never receiving the new table. I rsync'ed just the pg_xlog files using "pg_xlog/0*" to the Secondary pg_xlog directory (not the WAL archive directory pg_standby will recover from). These are the latest WAL log files and "000000010000000000000003.00000020.backup" (should I exclude backup file?). Then I triggered the recover. The new table is there and accessible.

My guess is that the logs in the WAL archive directory have been applied to the database and then when the database starts up it also has the last logs in the pg_xlog directory so it applies those last ones as well and we don't loose any data. Can someone confirm or deny this or a better solution to any of the above?

Thank you so much for your help,
Andrew


--
Andrew Puschak
(267) 614 - 2373
apuschak@gmail.com
http://andrewpuschak.com

Re: warm standby replication safe failover

From
Tom Lane
Date:
Andrew Puschak <apuschak@gmail.com> writes:
> I'm new here and have a question. I'm working on a replacement PostgreSQL
> 8.4 setup for a small company. They now have a Primary postgres server and
> 2 secondaries which are replicated using Slony-I, however the consulting
> company that set it up is afraid to failover and keeping Slony running is
> taking lots of resources.

It sounds like you're trying to keep them on 8.4.  Why not move them to
some newer release branch where you can make use of streaming
replication?  Aside from the benefits of that, 8.4 will be out of
support next year, so setting up new servers with that release series
seems pretty short-sighted.

I realize that dealing with a version upgrade might not be their highest
priority right now, but they're going to have to deal with it before too
long, and doing so now would put them in position to have a far better
replication solution than what you're describing.

            regards, tom lane


Re: warm standby replication safe failover

From
Andrew Puschak
Date:
Thanks Tom!

The PostgreSQL 8.4 is coming from the default latest CentOS 6.3 updates repository which makes it an easy install and upgrade later (yum install postgresql-server). I'll try installing/upgrading my test setup using http://yum.postgresql.org repository. I'm also going to need to find out what migrating to a newer version will affect with the current production data.
Andrew


On Sat, Feb 2, 2013 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Puschak <apuschak@gmail.com> writes:
> I'm new here and have a question. I'm working on a replacement PostgreSQL
> 8.4 setup for a small company. They now have a Primary postgres server and
> 2 secondaries which are replicated using Slony-I, however the consulting
> company that set it up is afraid to failover and keeping Slony running is
> taking lots of resources.

It sounds like you're trying to keep them on 8.4.  Why not move them to
some newer release branch where you can make use of streaming
replication?  Aside from the benefits of that, 8.4 will be out of
support next year, so setting up new servers with that release series
seems pretty short-sighted.

I realize that dealing with a version upgrade might not be their highest
priority right now, but they're going to have to deal with it before too
long, and doing so now would put them in position to have a far better
replication solution than what you're describing.

                        regards, tom lane



--
Andrew Puschak
(267) 614 - 2373
apuschak@gmail.com
http://andrewpuschak.com

Re: warm standby replication safe failover

From
Andrew Puschak
Date:
Hi again,

I setup PostgreSQL 9.2 to try replication and the 9.2 documentation answered my question whereas the 8.4 documentation is missing the section
25.2.2. Standby Server Operation http://www.postgresql.org/docs/9.2/static/warm-standby.html#STANDBY-SERVER-OPERATION It explains my assumption that if you copy over the last pg_xlog from the Primary, which I can do after safely shutting down postgres, recovery will first apply the WALs from the archive directory and then attempt to recover anything else in pg_xlog, but take priority over the WALs in the archive directory.

As for upgrading to the newer release branch, I'm asking the CentOS community where/how the PostgresSQL version is determined in their repositories. It would be easier if I could continue to use the default Base and Update repositories from CentOS to update postgres, but I don't know when they plan to include 9.x versions.
Andrew



On Sat, Feb 2, 2013 at 12:45 PM, Andrew Puschak <apuschak@gmail.com> wrote:
Thanks Tom!

The PostgreSQL 8.4 is coming from the default latest CentOS 6.3 updates repository which makes it an easy install and upgrade later (yum install postgresql-server). I'll try installing/upgrading my test setup using http://yum.postgresql.org repository. I'm also going to need to find out what migrating to a newer version will affect with the current production data.
Andrew


On Sat, Feb 2, 2013 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Puschak <apuschak@gmail.com> writes:
> I'm new here and have a question. I'm working on a replacement PostgreSQL
> 8.4 setup for a small company. They now have a Primary postgres server and
> 2 secondaries which are replicated using Slony-I, however the consulting
> company that set it up is afraid to failover and keeping Slony running is
> taking lots of resources.

It sounds like you're trying to keep them on 8.4.  Why not move them to
some newer release branch where you can make use of streaming
replication?  Aside from the benefits of that, 8.4 will be out of
support next year, so setting up new servers with that release series
seems pretty short-sighted.

I realize that dealing with a version upgrade might not be their highest
priority right now, but they're going to have to deal with it before too
long, and doing so now would put them in position to have a far better
replication solution than what you're describing.

                        regards, tom lane






--
Andrew Puschak
(267) 614 - 2373
apuschak@gmail.com
http://andrewpuschak.com