Thread: Streaming replication failover process - Pgsql 9.2

Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
Hi all,

There will be a network maintenance at the company where my servers are...

I've got one master and one slave server, running PostgreSQL 9.2.

As the network will be down, the internet won't be working as well as the intranet. Both servers won't be able to communicate each other. Not by streaming replication and wal_files too.

That should take at most 10 minutes.

1. When the connection comes back, will the master and slave work as expected? The streaming replication should be ok?

2. As the master will be down, I don't wanna slave turns into a master by Failover. The trigger_file line on recovery.conf on the slave server is commented, so should be ok here right? The slave will be still a slave once master is down....

Thanks!
Patrick

Re: Streaming replication failover process - Pgsql 9.2

From
John R Pierce
Date:
On 7/10/2016 2:19 PM, Patrick B wrote:
> 1. When the connection comes back, will the master and slave work as
> expected? The streaming replication should be ok?

as long as you have sufficient WAL available it should recover fine.
you might have to restart the slave to get it to reconnect.

>
> 2. As the master will be down, I don't wanna slave turns into a master
> by Failover. The trigger_file line on recovery.conf on the slave
> server is commented, so should be ok here right? The slave will be
> still a slave once master is down....
>

failover is a function of whatever cluster management software you use,
postgres won't failover on its own.


--
john r pierce, recycling bits in santa cruz



Re: Streaming replication failover process - Pgsql 9.2

From
Adrian Klaver
Date:
On 07/10/2016 02:19 PM, Patrick B wrote:
> Hi all,
>
> There will be a network maintenance at the company where my servers are...
>
> I've got one master and one slave server, running PostgreSQL 9.2.
>
> As the network will be down, the internet won't be working as well as
> the intranet. Both servers won't be able to communicate each other. Not
> by streaming replication and wal_files too.
>
> That should take at most 10 minutes.
>
> 1. When the connection comes back, will the master and slave work as
> expected? The streaming replication should be ok?

Assuming you have wal_keep_segments set high enough to keep the master
from recycling the WAL files before the network comes up again.

>
> 2. As the master will be down, I don't wanna slave turns into a master
> by Failover. The trigger_file line on recovery.conf on the slave server
> is commented, so should be ok here right? The slave will be still a
> slave once master is down....
>
> Thanks!
> Patrick


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Streaming replication failover process - Pgsql 9.2

From
Andreas Kretschmer
Date:

Am 10.07.2016 um 23:19 schrieb Patrick B:
> Hi all,
>
> There will be a network maintenance at the company where my servers are...
>
> I've got one master and one slave server, running PostgreSQL 9.2.
>
> As the network will be down, the internet won't be working as well as
> the intranet. Both servers won't be able to communicate each other.
> Not by streaming replication and wal_files too.
>
> That should take at most 10 minutes.
>
> 1. When the connection comes back, will the master and slave work as
> expected? The streaming replication should be ok?
if the master holds all needed WAL's there should be no problem.

You can ensure that with wal_keep_segments, or, in newer version, via
replication slots.




Re: Streaming replication failover process - Pgsql 9.2

From
John R Pierce
Date:
On 7/10/2016 2:42 PM, Andreas Kretschmer wrote:
1. When the connection comes back, will the master and slave work as expected? The streaming replication should be ok?
if the master holds all needed WAL's there should be no problem.

You can ensure that with wal_keep_segments, or, in newer version, via replication slots.

or via an external WAL archive, that the master writes, and the slave has access to.

-- 
john r pierce, recycling bits in santa cruz

Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
thanks guys.. thanks for all the comments...

I'm not shipping the wal_files into master, I actually ship them into slave and another backup server as well.

So I'll have to change my archive_command then :)

Thanks!

Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:

archive_command = 'cp %p /var/lib/pgsql/archive/%f'


That would be ok right guys?

I will also setup wal_keep_segments to 512

Re: Streaming replication failover process - Pgsql 9.2

From
Adrian Klaver
Date:
On 07/10/2016 04:28 PM, Patrick B wrote:
> archive_command = 'cp %p /var/lib/pgsql/archive/%f'

This would be where?

And does the corresponding restore_command point to the same place?

>
>
> That would be ok right guys?
>
> I will also setup wal_keep_segments to 512
>

The WAL segments kept would be dependent the number of WAL segments your
database generates on average over the worse case downtime interval plus
a CYA buffer.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:


2016-07-11 12:18 GMT+12:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 07/10/2016 04:28 PM, Patrick B wrote:
archive_command = 'cp %p /var/lib/pgsql/archive/%f'

This would be where?

master server
 

And does the corresponding restore_command point to the same place?

yes.. the slaves have the restore_command pointing to the same place.
As they won't be able to communicate with master, once all the services are up again, will have to copy them manually
 




That would be ok right guys?

I will also setup wal_keep_segments to 512


The WAL segments kept would be dependent the number of WAL segments your database generates on average over the worse case downtime interval plus a CYA buffer.


Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
If the master server can't send the wal_files through the slaves, shouldn't the wal_files be in "background" waiting to be delivered?

Otherwise what's the purpose of them? If a network fails I'd loose those files?


Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
oh ok.. got it..


wal_keep_segments = To prevent the primary server from removing the WAL segments required for the standby server before shipping them, set the minimum number of segments retained in the pg_xlog directory

so it would be ok just by increasing that parameter, right? Once the servers are back online, the shipping should start normally.

Is that right guys?
cheers

Re: Streaming replication failover process - Pgsql 9.2

From
John R Pierce
Date:
On 7/10/2016 4:28 PM, Patrick B wrote:
>
> archive_command = 'cp %p /var/lib/pgsql/archive/%f'
>
>
> That would be ok right guys?
>


normally, you want to ship your WAL archives to a NFS server or
something similar, which the master and all the slaves can read.

--
john r pierce, recycling bits in santa cruz



Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:


2016-07-11 15:48 GMT+12:00 John R Pierce <pierce@hogranch.com>:
On 7/10/2016 4:28 PM, Patrick B wrote:

archive_command = 'cp %p /var/lib/pgsql/archive/%f'


That would be ok right guys?



normally, you want to ship your WAL archives to a NFS server or something similar, which the master and all the slaves can read.

what if the network goes down?

Re: Streaming replication failover process - Pgsql 9.2

From
John R Pierce
Date:
On 7/10/2016 8:51 PM, Patrick B wrote:
> what if the network goes down?

that WAL server could be located in the same data center as the master
database server.  if your local area network goes down, well, you're
probably in a world of hurt.

if the wide area network is mission critical, it would be smart to have
multiple resilient network paths



--
john r pierce, recycling bits in santa cruz



Re: Streaming replication failover process - Pgsql 9.2

From
Adrian Klaver
Date:
On 07/10/2016 07:17 PM, Patrick B wrote:
> If the master server can't send the wal_files through the slaves,
> shouldn't the wal_files be in "background" waiting to be delivered?

Short version, yes, assuming you are talking about archiving the WAL
files somewhere  and assuming there is sufficient space for the quantity
of WAL files stored on what ever 'device' you are storing them on.

Long version and, I would say, required reading:

https://www.postgresql.org/docs/9.2/static/high-availability.html

>
> Otherwise what's the purpose of them? If a network fails I'd loose those
> files?
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
Hi guys,

I'm setting up a new slave server, using Postgres 9.2. This new slave server I'll call: New_slave.

I ran this command, from the new_slave server. It will connects to my other slave and copy the DB.

ssh postgres@slave05 'pg_basebackup --pgdata=- --format=tar --label=new_slave --progress --host=localhost --port=5432 --username=replicator --xlog' | tar -x --no-same-owner


That took me 10 days, as I'm copying +2TB. Note that I'm not using --xlog-method=stream, but I was copying the wal_files manually to the new_slave server. By the time the command above (pg_basebackup) finished, there was 1TB of wal_files.

I started to recovery the Database, setting the recovery.conf.recovery_command.

and then I got this error:
WAL segment `../wal_archive/000000020000171B000000D9` not found

So.. shit.. I don't know what happened, but I checked the master server and actually that file wasn't shipped into the new_server. I don't know why.



Questions:
1 - Do you have any recommendations? I'll have to perform that again.

2 - If I use pg_basebackup with -xlog-stream, do I have to have wal_shippment to the new_slave? Or the basebackup command would take care of them?

Using the pg_basebackup with stream should be?
ssh postgres@slave05 'pg_basebackup --pgdata=- --format=tar --label=new_slave --xlog-method=stream --progress --host=localhost --port=5432 --username=replicator --xlog' | tar -x --no-same-owner

3 - Instead of using pg_basebackup like that, should I dump into local disk(on the slave05 server) and then copy to the new_slave?

Cheers

Re: Streaming replication failover process - Pgsql 9.2

From
Patrick B
Date:
sorry... wrong email. Will create a new topic.