Thread: Speed up Switchover

Speed up Switchover

From

Date:

25 July 2013, 15:49:13

Hi guys,
I am looking for a way of speeding up the process of switching over of severs.
At the moment we are switching over via the trigger file, reconfiguring our applications, patching or updating the old primary and rsyncing the data over to the old primary.

I was wondering if there was an easier way to get the primary setup as a secondary without having to rsync the data as it can take up to 10 hours.

The command i am using is:
rsync -azve "ssh -c blowfish" --inplace --delete /usr/local/pgsql/data/ pgsql@old-primary:/usr/local/pgsql/data/

We are using PostgreSQL 9.2,FreeBSD 9.1, Streaming replication and WAL log archiving.
Thanks

TJ

Re: Speed up Switchover

From

Sergey Konoplev

Date:

26 July 2013, 05:00:53

On Thu, Jul 25, 2013 at 1:03 AM, TJ <tj@wallago.co.uk> wrote:
> I am looking for a way of speeding up the process of switching over of
> severs.
> At the moment we are switching over via the trigger file, reconfiguring our
> applications, patching or updating the old primary and rsyncing the data
> over to the old primary.
>
> I was wondering if there was an easier way to get the primary setup as a
> secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

From its docs:

pg_rewind is a tool for synchronizing a PostgreSQL data directory with another
PostgreSQL data directory that was forked from the first one. The result is
equivalent to rsyncing the first data directory (referred to as the old cluster
from now on) with the second one (the new cluster). The advantage of pg_rewind
over rsync is that pg_rewind uses the WAL to determine changed data blocks,
and does not require reading through all files in the cluster. That makes it
a lot faster when the database is large and only a small portion of it differs
between the clusters.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Speed up Switchover

From

Samrat Revagade

Date:

26 July 2013, 06:00:17

> secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

But I think it has a problem regarding the hint bits which Robert Hass
pointed out.
You can still solve hint bit problem by enabling new checksum feature, But
you have to face consequences such as performance overhead.

You can find the discussion about that on following link:

http://www.postgresql.org/message-id/flat/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com#CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com





-----
Greetings,
Samrat Revagade,
NTT-DATA-OSS Center (Pune)
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Speed-up-Switchover-tp5765160p5765235.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Speed up Switchover

From

Michael Paquier

Date:

26 July 2013, 06:17:34

On Fri, Jul 26, 2013 at 3:00 PM, Samrat Revagade <revagade.samrat@gmail.com> wrote:

> secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

But I think it has a problem regarding the hint bits which Robert Hass
pointed out.
You can still solve hint bit problem by enabling new checksum feature, But
you have to face consequences such as performance overhead.

You can find the discussion about that on following link:
http://www.postgresql.org/message-id/flat/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com#CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com

Exactly, using checksums is mandatory or you could finish with corrupted data after rewinding a data directory. Any help in testing or proposing patches for pg_rewind is welcome. It is still in early development.

Thanks,

--
Michael

Re: Speed up Switchover

From

Sergey Konoplev

Date:

26 July 2013, 06:22:44

On Thu, Jul 25, 2013 at 11:00 PM, Samrat Revagade
<revagade.samrat@gmail.com> wrote:
>>> secondary without having to rsync the data as it can take up to 10 hours.
>>
>> pg_rewind (https://github.com/vmware/pg_rewind) is what you need.
>
> But I think it has a problem regarding the hint bits which Robert Hass
> pointed out.
> You can still solve hint bit problem by enabling new checksum feature, But
> you have to face consequences such as performance overhead.

Or, if I understand it correct, you can just make sure that there is
no activity on the old master. Right?

>
> You can find the discussion about that on following link:
>
http://www.postgresql.org/message-id/flat/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com#CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com


--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Speed up Switchover

From

Andres Freund

Date:

26 July 2013, 06:54:46

On 2013-07-25 22:00:23 -0700, Sergey Konoplev wrote:
> On Thu, Jul 25, 2013 at 1:03 AM, TJ <tj@wallago.co.uk> wrote:
> > I am looking for a way of speeding up the process of switching over of
> > severs.
> > At the moment we are switching over via the trigger file, reconfiguring our
> > applications, patching or updating the old primary and rsyncing the data
> > over to the old primary.
> >
> > I was wondering if there was an easier way to get the primary setup as a
> > secondary without having to rsync the data as it can take up to 10 hours.
>
> pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

Beside the issue pointed out by Robert (via Samrat), it's also for 9.3
onwards only...

If it's a planned failover you can do better by shutting down the
servers manually...

a) shutdown master gracefully (i.e. -m fast, not immediate)
b) use pg_controldata to check for the last wal location
c) verify that the standby has received WAL up to that point, otherwise
   restart from a)
d) shutdown standby
e) remove recovery.conf on the standby, thus "silently" promoting it
f) start previous standby, now as master
g) create recovery.conf on the previous master
h) start previous master, now a standby

Note that you're deliberately circumventing security measures that way,
you need be rather careful to understand the reasoning behind those
steps and follow them carefully. But it allows to gracefully failover &
follow in large clusters with only very short outages.
From 9.3 onwards you don't even need d) and e) anymore and you can
replace it with a regular promotion which will be noticeably faster.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services