Thread: Re: [ADMIN] does wal archiving block the current client connection?
Simon Riggs <simon@2ndquadrant.com> writes: > OK, I'm on it. What solution have you got in mind? I was thinking about an fcntl lock to ensure only one archiver is active in a given data directory. That would fix the problem without affecting anything outside the archiver. Not sure what's the most portable way to do it though. regards, tom lane
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > OK, I'm on it. > > What solution have you got in mind? I was thinking about an fcntl lock > to ensure only one archiver is active in a given data directory. That > would fix the problem without affecting anything outside the archiver. > Not sure what's the most portable way to do it though. I was trying to think of a better way than using an archiver.pid file in pg_xlog/archive_status... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Fri, 2006-05-19 at 17:27 +0100, Simon Riggs wrote: > On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > OK, I'm on it. > > > > What solution have you got in mind? I was thinking about an fcntl lock > > to ensure only one archiver is active in a given data directory. That > > would fix the problem without affecting anything outside the archiver. > > Not sure what's the most portable way to do it though. > > I was trying to think of a better way than using an archiver.pid file in > pg_xlog/archive_status... Yesterday I posted to -patches with a new archiver.pid interlock mechanism. This will prevent server startup when the archiver is first activated, but once running will clean up and restart again. This doesn't quite get to the nub of the problem: archiver is designed to keep archiving files, even in the event that the postmaster explodes. It will keep archiving until they're all gone. My recent patch will prevent server startup, so if you do a fast restart to bounce the server and change parameters you'll have to keep the server down while the archiver completes (or you kill it). The archiver's Spartan diligence is great if postmaster does fail, but archiver can't tell the difference between a normal shutdown and a postmaster crash. If the postmaster sent a SIGUSR2 on normal shutdown, we would be able to interrupt the outer loop and shutdown much faster. A starting postmaster might then reasonably wait a little while for the old archiver to quit before starting the new one. What do you think? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > This doesn't quite get to the nub of the problem: archiver is designed > to keep archiving files, even in the event that the postmaster explodes. > It will keep archiving until they're all gone. I think we just need a PostmasterIsAlive check in the per-file loop. regards, tom lane
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > This doesn't quite get to the nub of the problem: archiver is designed > > to keep archiving files, even in the event that the postmaster explodes. > > It will keep archiving until they're all gone. > > I think we just need a PostmasterIsAlive check in the per-file loop. ...which would mean the archiver would not outlive postmaster in the event it crashes...which is exactly the time you want it to keep going. Granted, that's an easy change. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: >> I think we just need a PostmasterIsAlive check in the per-file loop. > ...which would mean the archiver would not outlive postmaster in the > event it crashes...which is exactly the time you want it to keep going. Postmaster crashes are not a problem in practice; we've been careful to keep the postmaster doing so little that there's no material risk of it failing. If the postmaster dies it's almost certainly because someone killed it, and you really want the child processes to close up shop too. (If we did want the archiver to keep running, it shouldn't have any PostmasterIsAlive check at all; I can't see a reason why completing one iteration of the outer loop is a better time to stop than any other time.) regards, tom lane
On Tue, 2006-05-23 at 11:09 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: > >> I think we just need a PostmasterIsAlive check in the per-file loop. > > > ...which would mean the archiver would not outlive postmaster in the > > event it crashes...which is exactly the time you want it to keep going. > > Postmaster crashes are not a problem in practice; we've been careful to > keep the postmaster doing so little that there's no material risk of it > failing. If the postmaster dies it's almost certainly because someone > killed it, and you really want the child processes to close up shop too. > > (If we did want the archiver to keep running, it shouldn't have any > PostmasterIsAlive check at all; I can't see a reason why completing > one iteration of the outer loop is a better time to stop than any > other time.) This does at least solve the fast restart problem, so look on -patches in a few minutes. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > My recent patch will prevent server startup, so if you do a fast restart > to bounce the server and change parameters you'll have to keep the > server down while the archiver completes (or you kill it). BTW, I was not planning on having it do that. The archiver subprocess should fail to start (and the PM keep trying to start it). Not take down the entire database. regards, tom lane