Thread: Multiple Slave Failover with PITR

Multiple Slave Failover with PITR

From

Ken Brush

Date:

27 March 2012, 17:48:05

Hello everyone,

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

I still use the fail-proof method above to re-incorporate the old
master as a new slave.

Sincerely,
-Ken

Re: Multiple Slave Failover with PITR

From

"Albe Laurenz"

Date:

28 March 2012, 12:29:51

Ken Brush wrote:
> I notice that the documentation at:
> http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
>
> Doesn't contain steps in a Multiple Slave setup for re-establishing
> them after a slave has become the new master.
>
> Based on the documentation, here are the most fail-proof steps I came
up with:
>
> 1. Master dies :(
> 2. Touch the trigger file on the most caught up slave.
> 3. Slave is now the new master :)
> 4. use pg_basebackup or other binary replication trick (rsync, tar
> over ssh, etc...) to bring the other slaves up to speed with the new
> master.
> 5. start the other slaves pointing to the new master.
>
> But, that can take time (about 1-2 hours) with my medium sized DB
> (580GB currently).
>
> After testing a few different ideas that I gleaned from posts on the
> mail list, I came up with this alternative method:
>
> 1. Master dies :(
> 2. Touch the trigger file on the most caught up slave
> 3. Slave is now the new master.
> 4. On the other slaves do the following:
> 5. Shutdown postgres on the slave
> 6. Delete every file in /data/pgsql/data/pg_xlog
> 7. Modify the recovery.conf file to point to the new master and
> include the line "recovery_target_timeline='latest'"
> 8. Copy the history file from the new master to the slave (it's the
> most recent #.history file in the xlog directory)
> 9. Startup postgres on the slave and watch it sync up to the new
> master (about 1-5 minutes usually)
>
> My question is this. Is the alternative method adequate? I tested it a
> bit and couldn't find any problems with data loss or inconsistency.

That sounds like it should work fine.

Yours,
Laurenz Albe

Re: Multiple Slave Failover with PITR

From

Sergey Konoplev

Date:

11 April 2012, 16:03:58

On Wed, Mar 28, 2012 at 11:35 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> 1. Master dies :(
>> 2. Touch the trigger file on the most caught up slave

If the master was stopped properly will the slaves be in sync to each other?

>> 3. Slave is now the new master.
>> 4. On the other slaves do the following:
>> 5. Shutdown postgres on the slave
>> 6. Delete every file in /data/pgsql/data/pg_xlog
>> 7. Modify the recovery.conf file to point to the new master and
>> include the line "recovery_target_timeline='latest'"
>> 8. Copy the history file from the new master to the slave (it's the
>> most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

>> 9. Startup postgres on the slave and watch it sync up to the new
>> master (about 1-5 minutes usually)

^^^

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

Re: Multiple Slave Failover with PITR

From

Ken Brush

Date:

11 April 2012, 16:12:14

On Wed, Apr 11, 2012 at 9:03 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Wed, Mar 28, 2012 at 11:35 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>>> 1. Master dies :(
>>> 2. Touch the trigger file on the most caught up slave
>
> If the master was stopped properly will the slaves be in sync to each other?

I don't think you can guarantee that. Hence why you pick the most
caught up slave, it will catch the other slaves up to it's state, once
it becomes the master.

>>> 3. Slave is now the new master.
>>> 4. On the other slaves do the following:
>>> 5. Shutdown postgres on the slave
>>> 6. Delete every file in /data/pgsql/data/pg_xlog
>>> 7. Modify the recovery.conf file to point to the new master and
>>> include the line "recovery_target_timeline='latest'"
>>> 8. Copy the history file from the new master to the slave (it's the
>>> most recent #.history file in the xlog directory)
>
> It will work in the case of archive_command presence only and I will
> need to sync the whole pg_xlog content if do not have archive_command
> in recovery.conf, correct?

The new master will sync out the WAL logs from pg_xlog that the slaves
need. The wal sender/receiver system is what I rely on for this.

Sincerely,
-Ken

Re: Multiple Slave Failover with PITR

From

Sergey Konoplev

Date:

11 April 2012, 16:50:46

On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:
>>>> 8. Copy the history file from the new master to the slave (it's the
>>>> most recent #.history file in the xlog directory)
>>
>> It will work in the case of archive_command presence only and I will
>> need to sync the whole pg_xlog content if do not have archive_command
>> in recovery.conf, correct?
>
> The new master will sync out the WAL logs from pg_xlog that the slaves
> need. The wal sender/receiver system is what I rely on for this.

So you do not have archive_command in recovery.conf, do you?

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

Re: Multiple Slave Failover with PITR

From

Ken Brush

Date:

11 April 2012, 17:09:01

On Wed, Apr 11, 2012 at 9:50 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:
>>>>> 8. Copy the history file from the new master to the slave (it's the
>>>>> most recent #.history file in the xlog directory)
>>>
>>> It will work in the case of archive_command presence only and I will
>>> need to sync the whole pg_xlog content if do not have archive_command
>>> in recovery.conf, correct?
>>
>> The new master will sync out the WAL logs from pg_xlog that the slaves
>> need. The wal sender/receiver system is what I rely on for this.
>
> So you do not have archive_command in recovery.conf, do you?
>

Correct, I do not.

-Ken

Re: Multiple Slave Failover with PITR

From

Sergey Konoplev

Date:

11 April 2012, 17:11:08

On Wed, Apr 11, 2012 at 9:08 PM, Ken Brush <kbrush@gmail.com> wrote:
> On Wed, Apr 11, 2012 at 9:50 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
>> On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:
>>>>>> 8. Copy the history file from the new master to the slave (it's the
>>>>>> most recent #.history file in the xlog directory)
>>>>
>>>> It will work in the case of archive_command presence only and I will
>>>> need to sync the whole pg_xlog content if do not have archive_command
>>>> in recovery.conf, correct?
>>>
>>> The new master will sync out the WAL logs from pg_xlog that the slaves
>>> need. The wal sender/receiver system is what I rely on for this.
>>
>> So you do not have archive_command in recovery.conf, do you?
>>
>
> Correct, I do not.

Okay, thank you. Now things are much more clean.

>
> -Ken



--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp