Thread: Archiver not picking up changes to archive_command
Hi, I'm stumped by an issue we are experiencing at the moment. We have been successfully archiving logs to two standby sites for many months now using the following command: rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/ Due to some heavy processing today, we have been falling behind on shipping log files (by about a 1000 logs or so), so wanted to up our bwlimit like so: rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ The db is showing the change. SHOW archive_command: rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ Yet, the running processes never get above the original bwlimit of 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help? (I'm leery of trying that untested though) ps aux | grep rsync postgres 27704 0.0 0.0 63820 1068 ? S 16:55 0:00 sh -c rsync -a pg_xlog/000000010000071700000070 postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/ postgres 27714 37.2 0.0 68716 1612 ? S 16:55 0:01 rsync --bwlimit=1250 -az pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/ postgres 27715 3.0 0.0 60764 5648 ? S 16:55 0:00 ssh -l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 . /WAL_Archive/ Thanks, bricklen
Sorry, version: PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit On Mon, May 10, 2010 at 5:01 PM, bricklen <bricklen@gmail.com> wrote: > Hi, > > I'm stumped by an issue we are experiencing at the moment. We have > been successfully archiving logs to two standby sites for many months > now using the following command: > > rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync > --bwlimit=1250 -az %p postgres@14.121.70.98:/WAL_Archive/ > > Due to some heavy processing today, we have been falling behind on > shipping log files (by about a 1000 logs or so), so wanted to up our > bwlimit like so: > > rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync > --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > > > The db is showing the change. > SHOW archive_command: > rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync > --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > > > Yet, the running processes never get above the original bwlimit of > 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help? > (I'm leery of trying that untested though) > > ps aux | grep rsync > postgres 27704 0.0 0.0 63820 1068 ? S 16:55 0:00 sh -c > rsync -a pg_xlog/000000010000071700000070 > postgres@192.168.80.174:/WAL_Archive/ && rsync --bwlimit=1250 -az > pg_xlog/000000010000071700000070 postgres@14.121.70.98:/WAL_Archive/ > postgres 27714 37.2 0.0 68716 1612 ? S 16:55 0:01 rsync > --bwlimit=1250 -az pg_xlog/000000010000071700000070 > postgres@14.121.70.98:/WAL_Archive/ > postgres 27715 3.0 0.0 60764 5648 ? S 16:55 0:00 ssh > -l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 . > /WAL_Archive/ > > > Thanks, > > bricklen >
bricklen <bricklen@gmail.com> writes: > Due to some heavy processing today, we have been falling behind on > shipping log files (by about a 1000 logs or so), so wanted to up our > bwlimit like so: > rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync > --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > The db is showing the change. > SHOW archive_command: > rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync > --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > Yet, the running processes never get above the original bwlimit of > 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help? > (I'm leery of trying that untested though) A look at the code shows that the archiver only notices SIGHUP once per outer loop, so the change would only take effect once you catch up, which is not going to help much in this case. Possibly we should change it to check for SIGHUP after each archive_command execution. If you kill -9 the archiver process, the postmaster will just start a new one, but realize that that would result in two concurrent rsync's. It might work ok to kill -9 the archiver and the current rsync in the same command. regards, tom lane
Tom Lane wrote: > A look at the code shows that the archiver only notices SIGHUP once > per outer loop, so the change would only take effect once you catch up, > which is not going to help much in this case. Possibly we should change > it to check for SIGHUP after each archive_command execution. > I never considered this a really important issue to sort out because I tell everybody it's unwise to put something complicated directly into archive_command. Much better to call a script that gets passed %f/%p, then let that script do all the work; don't even have to touch the server config if you need to fix something then. The lack of error checking that you get when just writing some shell commands directly in the archive_command itself horrifies me in a production environment. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Mon, May 10, 2010 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A look at the code shows that the archiver only notices SIGHUP once > per outer loop, so the change would only take effect once you catch up, > which is not going to help much in this case. Possibly we should change > it to check for SIGHUP after each archive_command execution. > > If you kill -9 the archiver process, the postmaster will just start > a new one, but realize that that would result in two concurrent > rsync's. It might work ok to kill -9 the archiver and the current > rsync in the same command. > > regards, tom lane > I think I'll just wait it out, then sighup. Thanks for looking into this.
On Mon, May 10, 2010 at 6:12 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Tom Lane wrote: >> >> A look at the code shows that the archiver only notices SIGHUP once >> per outer loop, so the change would only take effect once you catch up, >> which is not going to help much in this case. Possibly we should change >> it to check for SIGHUP after each archive_command execution. >> > > I never considered this a really important issue to sort out because I tell > everybody it's unwise to put something complicated directly into > archive_command. Much better to call a script that gets passed %f/%p, then > let that script do all the work; don't even have to touch the server config > if you need to fix something then. The lack of error checking that you get > when just writing some shell commands directly in the archive_command itself > horrifies me in a production environment. > > -- > Greg Smith 2ndQuadrant US Baltimore, MD > PostgreSQL Training, Services and Support > greg@2ndQuadrant.com www.2ndQuadrant.us Thanks Greg, that's a good idea. I'll revise that series of commands into a script, and add some error handling as you suggest. Cheers, Bricklen
On Tue, May 11, 2010 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > bricklen <bricklen@gmail.com> writes: >> Due to some heavy processing today, we have been falling behind on >> shipping log files (by about a 1000 logs or so), so wanted to up our >> bwlimit like so: > >> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync >> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > >> The db is showing the change. >> SHOW archive_command: >> rsync -a %p postgres@192.168.80.174:/WAL_Archive/ && rsync >> --bwlimit=1875 -az %p postgres@14.121.70.98:/WAL_Archive/ > >> Yet, the running processes never get above the original bwlimit of >> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help? >> (I'm leery of trying that untested though) > > A look at the code shows that the archiver only notices SIGHUP once > per outer loop, so the change would only take effect once you catch up, > which is not going to help much in this case. Possibly we should change > it to check for SIGHUP after each archive_command execution. +1 Here is the simple patch to do so. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center