Re: pg_stop_backup does not complete - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: pg_stop_backup does not complete |
Date | |
Msg-id | 1266951502.3752.4294.camel@ebony Whole thread Raw |
In response to | pg_stop_backup does not complete (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: pg_stop_backup does not complete
Re: pg_stop_backup does not complete Re: pg_stop_backup does not complete Re: pg_stop_backup does not complete Re: pg_stop_backup does not complete |
List | pgsql-hackers |
On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote: > 1) Set up a brand new master with an archive-commmand and archive=on. > > 2) Start the master > > 3) Do a pg_start_backup() > > 4) Realize, based on log error messages, that I've misconfigured the > archive_command. > 5) Attempt to shut down the master. Master tells me that pg_stop_backup > must be run in order to shut down. > > 6) Execute pg_stop_backup. > > 7) pg_stop_backup waits forever without ever stopping backup. Ever 60 > seconds, it give me a helpful "still waiting" message, but at least in > the amount of time I was willing to wait (5 minutes), it never completed. > > 8) do an immediate shutdown, as it's the only way I can get the database > unstuck. > > With some experimentation, the problem seems to occur when you have a > failing archive_command and a master which currently has no database > traffic; for example, if I did some database write activity (a createdb) > then pg_stop_backup would complete after about 60 seconds (which, btw, > is extremely annoying, but at least tolerable). > > This issue is 100% reproduceable. IMHO there in no problem in that behaviour. If somebody requests a backup then we should wait for it to complete. Kevin's suggestion of pg_fail_backup() is the only sensible conclusion there because it gives an explicit way out of deadlock. ISTM the problem is that you didn't test. Steps 3 and 4 should have been reversed. Perhaps we should put something in the docs to say "and test". The correct resolution is to put in an archive_command that works. We can put in an extra step to prevent a pg_start_backup() if there are a significant number of outstanding files to be archived. Doing that seems like closing the door after the horse has bolted, since we just introduced streaming replication that doesn't rely on archived files. In any case, I don't see many people working on a production system hitting a problem on an archive_command and then deciding to shut down. So I don't see this as something that needs fixing for 9.0. There is already too much non-essential code there, all of which needs to be tested. I don't think adding in new corner cases to "help" people makes any sense until we have automated testing that allows us to rerun the regression tests to check all this stuff still works. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: