Thread: Archiver not picking up changes to archive_command

Archiver not picking up changes to archive_command

From
bricklen
Date:
Hi,

I'm stumped by an issue we are experiencing at the moment. We have
been successfully archiving logs to two standby sites for many months
now using the following command:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/


The db is showing the change.
SHOW archive_command:
rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/


Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

ps aux | grep rsync
postgres 27704  0.0  0.0  63820  1068 ?        S    16:55   0:00 sh -c
rsync -a pg_xlog/000000010000071700000070
postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az
pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/
postgres 27714 37.2  0.0  68716  1612 ?        S    16:55   0:01 rsync
--bwlimit=1250 -az pg_xlog/000000010000071700000070
postgres@14.121.70.98:/WAL_Archive/
postgres 27715  3.0  0.0  60764  5648 ?        S    16:55   0:00 ssh
-l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
/WAL_Archive/


Thanks,

bricklen

Re: Archiver not picking up changes to archive_command

From
bricklen
Date:
Sorry, version: PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled
by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit


On Mon, May 10, 2010 at 5:01 PM, bricklen <bricklen@gmail.com> wrote:
> Hi,
>
> I'm stumped by an issue we are experiencing at the moment. We have
> been successfully archiving logs to two standby sites for many months
> now using the following command:
>
> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
> --bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/
>
> Due to some heavy processing today, we have been falling behind on
> shipping log files (by about a 1000 logs or so), so wanted to up our
> bwlimit like so:
>
> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/
>
>
> The db is showing the change.
> SHOW archive_command:
> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/
>
>
> Yet, the running processes never get above the original bwlimit of
> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
> (I'm leery of trying that untested though)
>
> ps aux | grep rsync
> postgres 27704  0.0  0.0  63820  1068 ?        S    16:55   0:00 sh -c
> rsync -a pg_xlog/000000010000071700000070
> postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az
> pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/
> postgres 27714 37.2  0.0  68716  1612 ?        S    16:55   0:01 rsync
> --bwlimit=1250 -az pg_xlog/000000010000071700000070
> postgres@14.121.70.98:/WAL_Archive/
> postgres 27715  3.0  0.0  60764  5648 ?        S    16:55   0:00 ssh
> -l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
> /WAL_Archive/
>
>
> Thanks,
>
> bricklen
>

Re: Archiver not picking up changes to archive_command

From
Tom Lane
Date:
bricklen <bricklen@gmail.com> writes:
> Due to some heavy processing today, we have been falling behind on
> shipping log files (by about a 1000 logs or so), so wanted to up our
> bwlimit like so:

> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

> The db is showing the change.
> SHOW archive_command:
> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/

> Yet, the running processes never get above the original bwlimit of
> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
> (I'm leery of trying that untested though)

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case.  Possibly we should change
it to check for SIGHUP after each archive_command execution.

If you kill -9 the archiver process, the postmaster will just start
a new one, but realize that that would result in two concurrent
rsync's.  It might work ok to kill -9 the archiver and the current
rsync in the same command.

            regards, tom lane

Re: Archiver not picking up changes to archive_command

From
Greg Smith
Date:
Tom Lane wrote:
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.
>

I never considered this a really important issue to sort out because I
tell everybody it's unwise to put something complicated directly into
archive_command.  Much better to call a script that gets passed %f/%p,
then let that script do all the work; don't even have to touch the
server config if you need to fix something then.  The lack of error
checking that you get when just writing some shell commands directly in
the archive_command itself horrifies me in a production environment.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Archiver not picking up changes to archive_command

From
bricklen
Date:
On Mon, May 10, 2010 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.
>
> If you kill -9 the archiver process, the postmaster will just start
> a new one, but realize that that would result in two concurrent
> rsync's.  It might work ok to kill -9 the archiver and the current
> rsync in the same command.
>
>                        regards, tom lane
>

I think I'll just wait it out, then sighup.

Thanks for looking into this.

Re: Archiver not picking up changes to archive_command

From
bricklen
Date:
On Mon, May 10, 2010 at 6:12 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Tom Lane wrote:
>>
>> A look at the code shows that the archiver only notices SIGHUP once
>> per outer loop, so the change would only take effect once you catch up,
>> which is not going to help much in this case.  Possibly we should change
>> it to check for SIGHUP after each archive_command execution.
>>
>
> I never considered this a really important issue to sort out because I tell
> everybody it's unwise to put something complicated directly into
> archive_command.  Much better to call a script that gets passed %f/%p, then
> let that script do all the work; don't even have to touch the server config
> if you need to fix something then.  The lack of error checking that you get
> when just writing some shell commands directly in the archive_command itself
> horrifies me in a production environment.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us

Thanks Greg, that's a good idea. I'll revise that series of commands
into a script, and add some error handling as you suggest.


Cheers,

Bricklen

Re: Archiver not picking up changes to archive_command

From
Fujii Masao
Date:
On Tue, May 11, 2010 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> bricklen <bricklen@gmail.com> writes:
>> Due to some heavy processing today, we have been falling behind on
>> shipping log files (by about a 1000 logs or so), so wanted to up our
>> bwlimit like so:
>
>> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/
>
>> The db is showing the change.
>> SHOW archive_command:
>> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/
>
>> Yet, the running processes never get above the original bwlimit of
>> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
>> (I'm leery of trying that untested though)
>
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.

+1

Here is the simple patch to do so.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center