Thread: Pausing log shipping for streaming replication

Pausing log shipping for streaming replication

From

Joseph Kregloh

Date:

15 December 2014, 17:13:00

Hello,

I have a master multi slave streaming replication setup. One master and two slaves. I need to do some maintenance on one of the slaves as one of the drives died however there is some other weird things going on in that array that I would need to investigate. So I am expecting the machine to be down at least two hours.

I remember reading that if a master cannot connect to the slave it would hold the log file from shipping. Is there any other way to hold the file until the slave comes back online? Would it affect both slaves not getting their files shipped over?

The good thing is that the slave in question is not serving any connections.

From what I remember emptying out the archive_command would pause log shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,

-Joseph Kregloh

Re: Pausing log shipping for streaming replication

From

Patrick Krecker

Date:

15 December 2014, 17:59:36

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two
> slaves. I need to do some maintenance on one of the slaves as one of the
> drives died however there is some other weird things going on in that array
> that I would need to investigate. So I am expecting the machine to be down
> at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would
> hold the log file from shipping. Is there any other way to hold the file
> until the slave comes back online? Would it affect both slaves not getting
> their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log
> shipping. Can the same be done by issuing a pg_stop_backup()?
>
> Thanks,
> -Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

Re: Pausing log shipping for streaming replication

From

Joseph Kregloh

Date:

15 December 2014, 18:30:09

On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com> wrote:

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two
> slaves. I need to do some maintenance on one of the slaves as one of the
> drives died however there is some other weird things going on in that array
> that I would need to investigate. So I am expecting the machine to be down
> at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would
> hold the log file from shipping. Is there any other way to hold the file
> until the slave comes back online? Would it affect both slaves not getting
> their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log
> shipping. Can the same be done by issuing a pg_stop_backup()?
>
> Thanks,
> -Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

Currently my archive_command is to a sh script which internally does an rsync. It actually rsyncs to both slaves and then a Barman location. If I fail the archive_command, then i'll have a problem because my primary slave serves read only queries, so it might start serving out stale data.

What I was thinking is shipping the log files that would go to the second slave to another machine or location on the master. Then once I am done with the maintenance i'll move those files over to the incoming folder. That would give a hopefully contain all the WAL files for the slave to catch up. Any thoughts against this?

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

Re: Pausing log shipping for streaming replication

From

Patrick Krecker

Date:

15 December 2014, 19:18:27

On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:
>
>
> On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
> wrote:
>>
>> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
>> wrote:
>> > Hello,
>> >
>> > I have a master multi slave streaming replication setup. One master and
>> > two
>> > slaves. I need to do some maintenance on one of the slaves as one of the
>> > drives died however there is some other weird things going on in that
>> > array
>> > that I would need to investigate. So I am expecting the machine to be
>> > down
>> > at least two hours.
>> >
>> > I remember reading that if a master cannot connect to the slave it would
>> > hold the log file from shipping. Is there any other way to hold the file
>> > until the slave comes back online? Would it affect both slaves not
>> > getting
>> > their files shipped over?
>> >
>> > The good thing is that the slave in question is not serving any
>> > connections.
>> >
>> > From what I remember emptying out the archive_command would pause log
>> > shipping. Can the same be done by issuing a pg_stop_backup()?
>> >
>> > Thanks,
>> > -Joseph Kregloh
>>
>> I think you will need to change your archive_command so it saves the
>> WALs to a location reachable by both slaves and the master, and have
>> both slaves pull from the same location. I don't think
>> pg_stop_backup() is useful in this situation.
>>
>
> Currently my archive_command is to a sh script which internally does an
> rsync. It actually rsyncs to both slaves and then a Barman location. If I
> fail the archive_command, then i'll have a problem because my primary slave
> serves read only queries, so it might start serving out stale data.
>
> What I was thinking is shipping the log files that would go to the second
> slave to another machine or location on the master. Then once I am done with
> the maintenance i'll move those files over to the incoming folder. That
> would give a hopefully contain all the WAL files for the slave to catch up.
> Any thoughts against this?

Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).

When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.

>
>>
>> The master will hold the logs as long as archive_command fails [1]. To
>> the extent that archive_command involves connecting to the slave, then
>> yes, Postgres will hold the WAL archives while the slave is down.
>> There are (at least) two reasons that saving the archives to some
>> other location is useful:
>>
>> 1) You don't risk running out of disk on the master due to batched up
>> WALs if a slave goes down.
>> 2) The backup of logs can be used to aid in point-in-time recovery.
>>
>> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
>
>

Re: Pausing log shipping for streaming replication

From

Andy Colson

Date:

15 December 2014, 19:41:09

On 12/15/2014 11:12 AM, Joseph Kregloh wrote:
> Hello,
>
> I have a master multi slave streaming replication setup. One master and
> two slaves. I need to do some maintenance on one of the slaves as one of
> the drives died however there is some other weird things going on in
> that array that I would need to investigate. So I am expecting the
> machine to be down at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would
> hold the log file from shipping. Is there any other way to hold the file
> until the slave comes back online? Would it affect both slaves not
> getting their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
>  From what I remember emptying out the archive_command would pause log
> shipping. Can the same be done by issuing a pg_stop_backup()?
>
> Thanks,
> -Joseph Kregloh

I kinda turn mine around, so to speak.
My master (web1) PG has this:

archive_command = '/usr/local/bin/pg_arch.sh "%p" "%f"'

/usr/local/bin/pg_arch.sh:
---------------
#!/bin/bash
# pg_arch.sh "%p" "%f"

archive='/pub/pgarchive'
set -e
if [ ! -f $archive/$2 ]
then
         /usr/bin/cp $1 $archive/webserv/$2
         /usr/bin/ln $archive/webserv/$2 $archive/web2/$2
fi
exit 0
---------------

I have one master (web1) and two slaves (web2 and webserv)

This always copies, and always returns 0.  (Note the use of ln so extra
disk space isn't wasted).

At this point I only collect wall, this script never removes it.  One
slave is very close, so it gets updated quickly.  The other is very far,
and only updates at night when I can copy for less $$.

It doesnt really matter how I get the two slaves updated (the close one,
actually, uses steaming.  The far one rsync, but that's besides the point)

The clean up happens in reverse.  I have a perl cron job that runs every
half hour.  It connects to master (web1) and each slave, and runs
something like:

$db is a slave, $master is the master.

$q = $db->prepare("SELECT pg_last_xlog_replay_location()");
$q->execute();
my ($webrp) = $q->fetchrow_array();
$q = undef;
$db->disconnect();

$q = $master->prepare("select file_name from pg_xlogfile_name_offset(
'$webrp' )");
$q->execute();
my ($web) = $q->fetchrow_array();
$q = undef;

system("/usr/bin/sudo -u postgres /usr/local/pgsq/bin/pg_archivecleanup
/pub/pgarchive/web2 $web")

What we do is have the master query the slave's location and then run
pg_archivecleanup.  This way if we loose communication I wont clean up
files.  And I wont clean up WAL until the slave has actually applied it.

And each slave is independent, so I can take one down and the master
will just keep collecting.  As soon as I bring the slave back up, we get
a response and starting  cleaning again.

-Andy

Re: Pausing log shipping for streaming replication

From

Joseph Kregloh

Date:

15 December 2014, 20:29:58

On Mon, Dec 15, 2014 at 2:18 PM, Patrick Krecker <patrick@judicata.com> wrote:

On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:
>
>
> On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
> wrote:
>>
>> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
>> wrote:
>> > Hello,
>> >
>> > I have a master multi slave streaming replication setup. One master and
>> > two
>> > slaves. I need to do some maintenance on one of the slaves as one of the
>> > drives died however there is some other weird things going on in that
>> > array
>> > that I would need to investigate. So I am expecting the machine to be
>> > down
>> > at least two hours.
>> >
>> > I remember reading that if a master cannot connect to the slave it would
>> > hold the log file from shipping. Is there any other way to hold the file
>> > until the slave comes back online? Would it affect both slaves not
>> > getting
>> > their files shipped over?
>> >
>> > The good thing is that the slave in question is not serving any
>> > connections.
>> >
>> > From what I remember emptying out the archive_command would pause log
>> > shipping. Can the same be done by issuing a pg_stop_backup()?
>> >
>> > Thanks,
>> > -Joseph Kregloh
>>
>> I think you will need to change your archive_command so it saves the
>> WALs to a location reachable by both slaves and the master, and have
>> both slaves pull from the same location. I don't think
>> pg_stop_backup() is useful in this situation.
>>
>
> Currently my archive_command is to a sh script which internally does an
> rsync. It actually rsyncs to both slaves and then a Barman location. If I
> fail the archive_command, then i'll have a problem because my primary slave
> serves read only queries, so it might start serving out stale data.
>
> What I was thinking is shipping the log files that would go to the second
> slave to another machine or location on the master. Then once I am done with
> the maintenance i'll move those files over to the incoming folder. That
> would give a hopefully contain all the WAL files for the slave to catch up.
> Any thoughts against this?

Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).

Plenty of disk space to accumulate WAL files.

When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.

I will disable Postgres from starting up. Once the machine is up and running I will move all of the "saved" WAL files into the folder I have designated in the restore command in my recovery.conf. In my case the following line:

restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'

Once all of the WAL files are there and the master is shipping WAL files I will start Postgres on the machine and it will begin processing the files until it catches up.

>
>>
>> The master will hold the logs as long as archive_command fails [1]. To
>> the extent that archive_command involves connecting to the slave, then
>> yes, Postgres will hold the WAL archives while the slave is down.
>> There are (at least) two reasons that saving the archives to some
>> other location is useful:
>>
>> 1) You don't risk running out of disk on the master due to batched up
>> WALs if a slave goes down.
>> 2) The backup of logs can be used to aid in point-in-time recovery.
>>
>> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
>
>

Re: Pausing log shipping for streaming replication

From

Sameer Kumar

Date:

16 December 2014, 00:12:56

On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
>
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two slaves. I need to do some maintenance on one of the slaves as one of the drives died however there is some other weird things going on in that array that I would need to investigate. So I am expecting the machine to be down at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would hold the log file from shipping. Is there any other way to hold the file until the slave comes back online? Would it affect both slaves not getting their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log shipping. Can the same be done by issuing a pg_stop_backup()?

Are you using streaming replication or log shipping? I believe both are different. Can you share you archive_command and recovery.conf content?

>
> Thanks,
> -Joseph Kregloh

Re: Pausing log shipping for streaming replication

From

Joseph Kregloh

Date:

16 December 2014, 15:37:01

On Mon, Dec 15, 2014 at 7:12 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
>
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two slaves. I need to do some maintenance on one of the slaves as one of the drives died however there is some other weird things going on in that array that I would need to investigate. So I am expecting the machine to be down at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would hold the log file from shipping. Is there any other way to hold the file until the slave comes back online? Would it affect both slaves not getting their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log shipping. Can the same be done by issuing a pg_stop_backup()?
Are you using streaming replication or log shipping? I believe both are different. Can you share you archive_command and recovery.conf content?

Streaming replication.

archive_command:

archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'

log_shipper.sh

#!/usr/local/bin/bash

rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;

rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp storage for WAL files

recovery.conf

standby_mode = 'on'

primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'

trigger_file = '/tmp/pgsql_prod_db.trigger'

restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'

archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'

>
> Thanks,
> -Joseph Kregloh

Re: Pausing log shipping for streaming replication

From

Sameer Kumar

Date:

17 December 2014, 12:17:22

On Tue, Dec 16, 2014 at 11:36 PM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:

archive_command:
archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'

log_shipper.sh
#!/usr/local/bin/bash
rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;
rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp storage for WAL files

Sorry, I read your other posts just now. This seems sane to me.

recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'
trigger_file = '/tmp/pgsql_prod_db.trigger'
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'

Since you are specifying the restore_command and archive (As per my understanding of your situation) is full, should not you start polling the secondary wal archive that you have spcifcied in archive.sh?

archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'

I generally don't prefer this since archives are helpful for a PITR as well so I prefer cleaning them up manually.

Best Regards,

Sameer Kumar | Database Consultant

ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: +65 8110 0350 T: +65 6438 3504 | www.ashnik.com

This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).

Thread: Pausing log shipping for streaming replication

Attachment