Thread: Pausing log shipping for streaming replication
On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote: > Hello, > > I have a master multi slave streaming replication setup. One master and two > slaves. I need to do some maintenance on one of the slaves as one of the > drives died however there is some other weird things going on in that array > that I would need to investigate. So I am expecting the machine to be down > at least two hours. > > I remember reading that if a master cannot connect to the slave it would > hold the log file from shipping. Is there any other way to hold the file > until the slave comes back online? Would it affect both slaves not getting > their files shipped over? > > The good thing is that the slave in question is not serving any connections. > > From what I remember emptying out the archive_command would pause log > shipping. Can the same be done by issuing a pg_stop_backup()? > > Thanks, > -Joseph Kregloh I think you will need to change your archive_command so it saves the WALs to a location reachable by both slaves and the master, and have both slaves pull from the same location. I don't think pg_stop_backup() is useful in this situation. The master will hold the logs as long as archive_command fails [1]. To the extent that archive_command involves connecting to the slave, then yes, Postgres will hold the WAL archives while the slave is down. There are (at least) two reasons that saving the archives to some other location is useful: 1) You don't risk running out of disk on the master due to batched up WALs if a slave goes down. 2) The backup of logs can be used to aid in point-in-time recovery. [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two
> slaves. I need to do some maintenance on one of the slaves as one of the
> drives died however there is some other weird things going on in that array
> that I would need to investigate. So I am expecting the machine to be down
> at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would
> hold the log file from shipping. Is there any other way to hold the file
> until the slave comes back online? Would it affect both slaves not getting
> their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log
> shipping. Can the same be done by issuing a pg_stop_backup()?
>
> Thanks,
> -Joseph Kregloh
I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.
The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:
1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.
[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote: > > > On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com> > wrote: >> >> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> >> wrote: >> > Hello, >> > >> > I have a master multi slave streaming replication setup. One master and >> > two >> > slaves. I need to do some maintenance on one of the slaves as one of the >> > drives died however there is some other weird things going on in that >> > array >> > that I would need to investigate. So I am expecting the machine to be >> > down >> > at least two hours. >> > >> > I remember reading that if a master cannot connect to the slave it would >> > hold the log file from shipping. Is there any other way to hold the file >> > until the slave comes back online? Would it affect both slaves not >> > getting >> > their files shipped over? >> > >> > The good thing is that the slave in question is not serving any >> > connections. >> > >> > From what I remember emptying out the archive_command would pause log >> > shipping. Can the same be done by issuing a pg_stop_backup()? >> > >> > Thanks, >> > -Joseph Kregloh >> >> I think you will need to change your archive_command so it saves the >> WALs to a location reachable by both slaves and the master, and have >> both slaves pull from the same location. I don't think >> pg_stop_backup() is useful in this situation. >> > > Currently my archive_command is to a sh script which internally does an > rsync. It actually rsyncs to both slaves and then a Barman location. If I > fail the archive_command, then i'll have a problem because my primary slave > serves read only queries, so it might start serving out stale data. > > What I was thinking is shipping the log files that would go to the second > slave to another machine or location on the master. Then once I am done with > the maintenance i'll move those files over to the incoming folder. That > would give a hopefully contain all the WAL files for the slave to catch up. > Any thoughts against this? Seems OK as long as you have the disk space to support the accumulation of WALs (considering for the situation where the downtime is much longer than anticipated). When you say "i'll move those files over to the incoming folder," what do you mean? I think that restore_command should be used on the slave to retrieve the WALs from the archive location. Once the secondary has caught up, you can change the configuration back to the old setup and remove the accumulated WALs from the temporary location. > >> >> The master will hold the logs as long as archive_command fails [1]. To >> the extent that archive_command involves connecting to the slave, then >> yes, Postgres will hold the WAL archives while the slave is down. >> There are (at least) two reasons that saving the archives to some >> other location is useful: >> >> 1) You don't risk running out of disk on the master due to batched up >> WALs if a slave goes down. >> 2) The backup of logs can be used to aid in point-in-time recovery. >> >> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html > >
On 12/15/2014 11:12 AM, Joseph Kregloh wrote: > Hello, > > I have a master multi slave streaming replication setup. One master and > two slaves. I need to do some maintenance on one of the slaves as one of > the drives died however there is some other weird things going on in > that array that I would need to investigate. So I am expecting the > machine to be down at least two hours. > > I remember reading that if a master cannot connect to the slave it would > hold the log file from shipping. Is there any other way to hold the file > until the slave comes back online? Would it affect both slaves not > getting their files shipped over? > > The good thing is that the slave in question is not serving any connections. > > From what I remember emptying out the archive_command would pause log > shipping. Can the same be done by issuing a pg_stop_backup()? > > Thanks, > -Joseph Kregloh I kinda turn mine around, so to speak. My master (web1) PG has this: archive_command = '/usr/local/bin/pg_arch.sh "%p" "%f"' /usr/local/bin/pg_arch.sh: --------------- #!/bin/bash # pg_arch.sh "%p" "%f" archive='/pub/pgarchive' set -e if [ ! -f $archive/$2 ] then /usr/bin/cp $1 $archive/webserv/$2 /usr/bin/ln $archive/webserv/$2 $archive/web2/$2 fi exit 0 --------------- I have one master (web1) and two slaves (web2 and webserv) This always copies, and always returns 0. (Note the use of ln so extra disk space isn't wasted). At this point I only collect wall, this script never removes it. One slave is very close, so it gets updated quickly. The other is very far, and only updates at night when I can copy for less $$. It doesnt really matter how I get the two slaves updated (the close one, actually, uses steaming. The far one rsync, but that's besides the point) The clean up happens in reverse. I have a perl cron job that runs every half hour. It connects to master (web1) and each slave, and runs something like: $db is a slave, $master is the master. $q = $db->prepare("SELECT pg_last_xlog_replay_location()"); $q->execute(); my ($webrp) = $q->fetchrow_array(); $q = undef; $db->disconnect(); $q = $master->prepare("select file_name from pg_xlogfile_name_offset( '$webrp' )"); $q->execute(); my ($web) = $q->fetchrow_array(); $q = undef; system("/usr/bin/sudo -u postgres /usr/local/pgsq/bin/pg_archivecleanup /pub/pgarchive/web2 $web") What we do is have the master query the slave's location and then run pg_archivecleanup. This way if we loose communication I wont clean up files. And I wont clean up WAL until the slave has actually applied it. And each slave is independent, so I can take one down and the master will just keep collecting. As soon as I bring the slave back up, we get a response and starting cleaning again. -Andy
On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:
>
>
> On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
> wrote:
>>
>> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
>> wrote:
>> > Hello,
>> >
>> > I have a master multi slave streaming replication setup. One master and
>> > two
>> > slaves. I need to do some maintenance on one of the slaves as one of the
>> > drives died however there is some other weird things going on in that
>> > array
>> > that I would need to investigate. So I am expecting the machine to be
>> > down
>> > at least two hours.
>> >
>> > I remember reading that if a master cannot connect to the slave it would
>> > hold the log file from shipping. Is there any other way to hold the file
>> > until the slave comes back online? Would it affect both slaves not
>> > getting
>> > their files shipped over?
>> >
>> > The good thing is that the slave in question is not serving any
>> > connections.
>> >
>> > From what I remember emptying out the archive_command would pause log
>> > shipping. Can the same be done by issuing a pg_stop_backup()?
>> >
>> > Thanks,
>> > -Joseph Kregloh
>>
>> I think you will need to change your archive_command so it saves the
>> WALs to a location reachable by both slaves and the master, and have
>> both slaves pull from the same location. I don't think
>> pg_stop_backup() is useful in this situation.
>>
>
> Currently my archive_command is to a sh script which internally does an
> rsync. It actually rsyncs to both slaves and then a Barman location. If I
> fail the archive_command, then i'll have a problem because my primary slave
> serves read only queries, so it might start serving out stale data.
>
> What I was thinking is shipping the log files that would go to the second
> slave to another machine or location on the master. Then once I am done with
> the maintenance i'll move those files over to the incoming folder. That
> would give a hopefully contain all the WAL files for the slave to catch up.
> Any thoughts against this?
Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).
When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.
>
>>
>> The master will hold the logs as long as archive_command fails [1]. To
>> the extent that archive_command involves connecting to the slave, then
>> yes, Postgres will hold the WAL archives while the slave is down.
>> There are (at least) two reasons that saving the archives to some
>> other location is useful:
>>
>> 1) You don't risk running out of disk on the master due to batched up
>> WALs if a slave goes down.
>> 2) The backup of logs can be used to aid in point-in-time recovery.
>>
>> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
>
>
On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
>
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two slaves. I need to do some maintenance on one of the slaves as one of the drives died however there is some other weird things going on in that array that I would need to investigate. So I am expecting the machine to be down at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would hold the log file from shipping. Is there any other way to hold the file until the slave comes back online? Would it affect both slaves not getting their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log shipping. Can the same be done by issuing a pg_stop_backup()?
Are you using streaming replication or log shipping? I believe both are different. Can you share you archive_command and recovery.conf content?
>
> Thanks,
> -Joseph Kregloh
On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
>
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two slaves. I need to do some maintenance on one of the slaves as one of the drives died however there is some other weird things going on in that array that I would need to investigate. So I am expecting the machine to be down at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would hold the log file from shipping. Is there any other way to hold the file until the slave comes back online? Would it affect both slaves not getting their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log shipping. Can the same be done by issuing a pg_stop_backup()?Are you using streaming replication or log shipping? I believe both are different. Can you share you archive_command and recovery.conf content?
>
> Thanks,
> -Joseph Kregloh
archive_command:archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'log_shipper.sh#!/usr/local/bin/bashrsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp storage for WAL files
recovery.confstandby_mode = 'on'primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'trigger_file = '/tmp/pgsql_prod_db.trigger'restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'
archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'
Best Regards,
Sameer Kumar | Database Consultant
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
M: +65 8110 0350 T: +65 6438 3504 | www.ashnik.com
This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).