Thread: Minor changes to Recovery related code
I'd like to make the following changes to recovery related code over the next few days/weeks. If anybody insists I do this by freeze or not at all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd rather take a bit more time and do this in one drop and there are some code dependencies between these changes and other patches from Koichi-san and myself. 1. Current xlog should be archived at shutdown for smart shutdown - check that archiver is active prior to shutdown checkpoint request - if (shutdown checkpoint && XLogArchivingActive()) RequestXLogSwitch() - for smart shutdown, have archiver complete its work before exiting 2. pg_stop_backup() should wait until all archive files are safely archived before returning 3. Need a %r parameter for restore_command, to allow the restore command be passed the name of the file containing the last restartpoint. This will allow the restore command to clean down old archive files more safely/cleanly in Warm Standby operation. - change to pg_standby to accept the parameter and use %r rather than -k parameter 4. Add an option to pg_standby to have it complete all outstanding archive files after it has been triggered, minimising data loss at the slight expense of cut-over time 5. log_restartpoint option in recovery.conf LOG each restorepoint, so can understand whether restartable recovery will be effective or not 6. refactor recovery.conf so that it uses a GUC-like parser 7. refactor all xlog _desc routines into one module, so these can be more easily used by xlogviewer utility 8. get xlogviewer utility a freshen-up so it can be part of main release, possibly including further refactoring of xlog.c 9. Another round of doc updates to highlight the use of pg_standby and Koichi-san's work. I think Doug Knight might have some additional code examples to include as well, from previous discussions. 10. Changes to ensure WAL-avoiding operations and hot backups cannot be executed simultaneously. One of these two options, ISTM: a) Following a change to archive_command while server is running. Log the xid of the WAL-avoiding operations when they start and have pg_start_backup() wait for those xids to complete before continuing. b) Introduce a new parameter, archive_mode = on | off that can only be set at server start. If archive_mode = on then XLogArchivingActive(); archiving only takes place when archive_command is not ''. This allows archive_command to be changed while server running, yet without any danger from WAL-avoiding operations. [7 & 8 would be complete by about 5-6 weeks from now. Others much earlier] Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > I'd like to make the following changes to recovery related code over the > next few days/weeks. If anybody insists I do this by freeze or not at > all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd > rather take a bit more time and do this in one drop and there are some > code dependencies between these changes and other patches from > Koichi-san and myself. Well, I've got a proposal for a pg_proc change that I haven't even started coding yet, so personally I won't hold you to having the patch submitted as long as the design is agreed to before feature freeze. However: > 2. pg_stop_backup() should wait until all archive files are safely > archived before returning Not sure I agree with that one. If it fails, you can't tell whether the action is done and it failed while waiting for the archiver, or if you need to redo it. > 6. refactor recovery.conf so that it uses a GUC-like parser I would suggest that after feature freeze is not the time for code beautification efforts like this. (I rather doubt that it's worth doing at all actually, but definitely not now.) > 7. refactor all xlog _desc routines into one module, so these can be > more easily used by xlogviewer utility Even more so. > 8. get xlogviewer utility a freshen-up so it can be part of main > release, possibly including further refactoring of xlog.c This is not happening for 8.3, either. > 10. Changes to ensure WAL-avoiding operations and hot backups cannot be > executed simultaneously. One of these two options, ISTM: > b) Introduce a new parameter, archive_mode = on | off that can only be > set at server start. If archive_mode = on then XLogArchivingActive(); > archiving only takes place when archive_command is not ''. This allows > archive_command to be changed while server running, yet without any > danger from WAL-avoiding operations. I think I'd go with (b) since a lot of people felt it should've been like that from the beginning, and found the magic "empty string" behavior confusing. > [7 & 8 would be complete by about 5-6 weeks from now. Others much > earlier] We are hoping to go beta in less time than that. While I'm willing to cut a little slack, anything you can't submit before about mid-April is not going to make it into 8.3. regards, tom lane
On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > I'd like to make the following changes to recovery related code over the > > next few days/weeks. If anybody insists I do this by freeze or not at > > all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd > > rather take a bit more time and do this in one drop and there are some > > code dependencies between these changes and other patches from > > Koichi-san and myself. > > Well, I've got a proposal for a pg_proc change that I haven't even started > coding yet, so personally I won't hold you to having the patch submitted > as long as the design is agreed to before feature freeze. However: Cool > > 2. pg_stop_backup() should wait until all archive files are safely > > archived before returning > > Not sure I agree with that one. If it fails, you can't tell whether the > action is done and it failed while waiting for the archiver, or if you > need to redo it. There's a slight delay between pg_stop_backup() completing and the archiver doing its stuff. Currently if somebody does a -m fast straight after the pg_stop_backup() the backup may be unusable. We need a way to plug that small hole. I suggest that pg_stop_backup() polls once per second until pg_xlog/archive_status/LOG.ready disappears, in which case it ends successfully. If it does this for more than 60 seconds it ends successfully but produces a WARNING. > > 6. refactor recovery.conf so that it uses a GUC-like parser > > I would suggest that after feature freeze is not the time for code > beautification efforts like this. (I rather doubt that it's worth doing > at all actually, but definitely not now.) OK > > 7. refactor all xlog _desc routines into one module, so these can be > > more easily used by xlogviewer utility > > Even more so. OK > > 8. get xlogviewer utility a freshen-up so it can be part of main > > release, possibly including further refactoring of xlog.c > > This is not happening for 8.3, either. OK > > 10. Changes to ensure WAL-avoiding operations and hot backups cannot be > > executed simultaneously. One of these two options, ISTM: > > b) Introduce a new parameter, archive_mode = on | off that can only be > > set at server start. If archive_mode = on then XLogArchivingActive(); > > archiving only takes place when archive_command is not ''. This allows > > archive_command to be changed while server running, yet without any > > danger from WAL-avoiding operations. > > I think I'd go with (b) since a lot of people felt it should've been > like that from the beginning, and found the magic "empty string" > behavior confusing. OK > We are hoping to go beta in less time than that. While I'm willing to > cut a little slack, anything you can't submit before about mid-April > is not going to make it into 8.3. No probs, you struck a line through the more time consuming items. :-) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: >> "Simon Riggs" <simon@2ndquadrant.com> writes: >>> 2. pg_stop_backup() should wait until all archive files are safely >>> archived before returning >> Not sure I agree with that one. If it fails, you can't tell whether the >> action is done and it failed while waiting for the archiver, or if you >> need to redo it. > > There's a slight delay between pg_stop_backup() completing and the > archiver doing its stuff. Currently if somebody does a -m fast straight > after the pg_stop_backup() the backup may be unusable. > > We need a way to plug that small hole. > > I suggest that pg_stop_backup() polls once per second until > pg_xlog/archive_status/LOG.ready disappears, in which case it ends > successfully. If it does this for more than 60 seconds it ends > successfully but produces a WARNING. I fear that ending sucessfully despite having not archived all wals will make this feature less worthwile. If a dba knows what he is doing, he can code a perfectly safe backup script using 8.2 too. He'll just have to check the current wal position after pg_stop_backup(), (There is a function for that, right?), and wait until the corresponding wal was archived. In realitly, however, I feare that most people will just create a script that does 'echo "select pg_stop_backup | psql"' or something similar. If they're a bit more carefull, they will enable ON_ERROR_STOP, and check the return value of pgsql. I believe that those are the people who would really benefit from a pg_stop_backup() that waits for archiving to complete. But they probably won't check for WARNINGs. Maybe doing it the other way round would be an option? pg_stop_backup() could wait for the archiver to complete forever, but spit out a warning every 60 seconds or so "WARNING: Still waiting for wal archiving of wal ??? to complete". If someone really wants a 60-second timeout, he can just use statement_timeout. Anyway, just my 0.02 eurocents, maybe I'm totally mistaken about the postgresql dba's out there... greetings, Florian Pflug
On Mar 30, 2007, at 5:51 PM, Florian G. Pflug wrote: > In realitly, however, I feare that most people will just create a > script > that does 'echo "select pg_stop_backup | psql"' or something similar. > If they're a bit more carefull, they will enable ON_ERROR_STOP, and > check > the return value of pgsql. I believe that those are the people who > would > really benefit from a pg_stop_backup() that waits for archiving to > complete. > But they probably won't check for WARNINGs. > > Maybe doing it the other way round would be an option? > pg_stop_backup() could wait for the archiver to complete forever, but > spit out a warning every 60 seconds or so "WARNING: Still waiting > for wal archiving of wal ??? to complete". If someone really wants > a 60-second timeout, he can just use statement_timeout. I agree; people are far more likely to take the lazy way out, so it'd be good to protect against that. Esp. since statement_timeout provides a means to get the other behavior. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Sat, 2007-03-31 at 00:51 +0200, Florian G. Pflug wrote: > Simon Riggs wrote: > > On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: > >> "Simon Riggs" <simon@2ndquadrant.com> writes: > >>> 2. pg_stop_backup() should wait until all archive files are safely > >>> archived before returning > >> Not sure I agree with that one. If it fails, you can't tell whether the > >> action is done and it failed while waiting for the archiver, or if you > >> need to redo it. > > > > There's a slight delay between pg_stop_backup() completing and the > > archiver doing its stuff. Currently if somebody does a -m fast straight > > after the pg_stop_backup() the backup may be unusable. > > > > We need a way to plug that small hole. > > > > I suggest that pg_stop_backup() polls once per second until > > pg_xlog/archive_status/LOG.ready disappears, in which case it ends > > successfully. If it does this for more than 60 seconds it ends > > successfully but produces a WARNING. > > I fear that ending sucessfully despite having not archived all wals > will make this feature less worthwile. If a dba knows what he is > doing, he can code a perfectly safe backup script using 8.2 too. > He'll just have to check the current wal position after pg_stop_backup(), > (There is a function for that, right?), and wait until the corresponding > wal was archived. > > In realitly, however, I feare that most people will just create a script > that does 'echo "select pg_stop_backup | psql"' or something similar. > If they're a bit more carefull, they will enable ON_ERROR_STOP, and check > the return value of pgsql. I believe that those are the people who would > really benefit from a pg_stop_backup() that waits for archiving to complete. > But they probably won't check for WARNINGs. > > Maybe doing it the other way round would be an option? > pg_stop_backup() could wait for the archiver to complete forever, but > spit out a warning every 60 seconds or so "WARNING: Still waiting > for wal archiving of wal ??? to complete". If someone really wants > a 60-second timeout, he can just use statement_timeout. I've just come up against this problem again, so I think it is a must fix for this release. Other problems exist also, mentioned on separate threads. We have a number of problems surrounding pg_stop_backup/shutdown: 1. pg_stop_backup() currently returns before the WAL file containing the last change is correctly archived. That is a small hole, but one that is exposed when people write test scripts that immediately shutdown the database after issuing pg_stop_backup(). It doesn't make much sense to shutdown immediately after a hot backup, but it should still work sensibly. 2. We've also had problems caused by making the archiver wait until all WAL files are archived. If there is a backlog for some reason and the DBA issues a restart (i.e. stop and immediate restart) then making the archiver loop while it tries (and possibly fails) to archive all files would cause an outage. Avoiding this is why we do the current get-out-fast approach. There are some sub scenarios: a) there is a backlog of WAL files, but no error has occurred on the *last* file (we might have just fixed a problem). b) there is a backlog of WAL files, but an error is causing a retry of the last file. My proposal is for us to record somewhere other than the logs that a failure to archive has occurred and is being retried. Failure to archive will be recorded in the archive_status directory as an additional file called archive_error, which will be deleted in the case of archive success and created in the case of archive error. This maintains archiver's lack of attachment to shared memory and general simplicity of design. - pg_stop_backup() will wait until the WAL file that ends the backup is safely archived, even if a failure to archive occurs. This is a change to current behaviour, but since it implements the originally *expected* behaviour IMHO it should be the default. - new function: pg_stop_backup_nowait() return immediately without waiting for archive, the same as the current pg_stop_backup() - new function: pg_stop_backup_wait(int seconds) wait until either an archival fails or the ending WAL file is archived, with a max wait as specified. wait=0 means wait until archive errors are resolved. Alternatives? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-06-07 at 21:53 +0100, Simon Riggs wrote: > We have a number of problems surrounding pg_stop_backup/shutdown: I'll start coding up my proposals, in the absence of alternate views. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > On Sat, 2007-03-31 at 00:51 +0200, Florian G. Pflug wrote: > > Simon Riggs wrote: > > > On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: > > >> "Simon Riggs" <simon@2ndquadrant.com> writes: > > >>> 2. pg_stop_backup() should wait until all archive files are safely > > >>> archived before returning > > >> Not sure I agree with that one. If it fails, you can't tell whether the > > >> action is done and it failed while waiting for the archiver, or if you > > >> need to redo it. > > > > > > There's a slight delay between pg_stop_backup() completing and the > > > archiver doing its stuff. Currently if somebody does a -m fast straight > > > after the pg_stop_backup() the backup may be unusable. > > > > > > We need a way to plug that small hole. > > > > > > I suggest that pg_stop_backup() polls once per second until > > > pg_xlog/archive_status/LOG.ready disappears, in which case it ends > > > successfully. If it does this for more than 60 seconds it ends > > > successfully but produces a WARNING. > > > > I fear that ending sucessfully despite having not archived all wals > > will make this feature less worthwile. If a dba knows what he is > > doing, he can code a perfectly safe backup script using 8.2 too. > > He'll just have to check the current wal position after pg_stop_backup(), > > (There is a function for that, right?), and wait until the corresponding > > wal was archived. > > > > In realitly, however, I feare that most people will just create a script > > that does 'echo "select pg_stop_backup | psql"' or something similar. > > If they're a bit more carefull, they will enable ON_ERROR_STOP, and check > > the return value of pgsql. I believe that those are the people who would > > really benefit from a pg_stop_backup() that waits for archiving to complete. > > But they probably won't check for WARNINGs. > > > > Maybe doing it the other way round would be an option? > > pg_stop_backup() could wait for the archiver to complete forever, but > > spit out a warning every 60 seconds or so "WARNING: Still waiting > > for wal archiving of wal ??? to complete". If someone really wants > > a 60-second timeout, he can just use statement_timeout. > > I've just come up against this problem again, so I think it is a must > fix for this release. Other problems exist also, mentioned on separate > threads. > > We have a number of problems surrounding pg_stop_backup/shutdown: > > 1. pg_stop_backup() currently returns before the WAL file containing the > last change is correctly archived. That is a small hole, but one that is > exposed when people write test scripts that immediately shutdown the > database after issuing pg_stop_backup(). It doesn't make much sense to > shutdown immediately after a hot backup, but it should still work > sensibly. > > 2. We've also had problems caused by making the archiver wait until all > WAL files are archived. If there is a backlog for some reason and the > DBA issues a restart (i.e. stop and immediate restart) then making the > archiver loop while it tries (and possibly fails) to archive all files > would cause an outage. Avoiding this is why we do the current > get-out-fast approach. > There are some sub scenarios: > a) there is a backlog of WAL files, but no error has occurred on the > *last* file (we might have just fixed a problem). > b) there is a backlog of WAL files, but an error is causing a retry of > the last file. > > My proposal is for us to record somewhere other than the logs that a > failure to archive has occurred and is being retried. Failure to archive > will be recorded in the archive_status directory as an additional file > called archive_error, which will be deleted in the case of archive > success and created in the case of archive error. This maintains > archiver's lack of attachment to shared memory and general simplicity of > design. > > - pg_stop_backup() will wait until the WAL file that ends the backup is > safely archived, even if a failure to archive occurs. This is a change > to current behaviour, but since it implements the originally *expected* > behaviour IMHO it should be the default. > > - new function: pg_stop_backup_nowait() return immediately without > waiting for archive, the same as the current pg_stop_backup() > > - new function: pg_stop_backup_wait(int seconds) wait until either an > archival fails or the ending WAL file is archived, with a max wait as > specified. wait=0 means wait until archive errors are resolved. > > Alternatives? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Follow-up during March 2008 CommitFest On Thu, 2007-06-07 at 21:53 +0100, Simon Riggs wrote: > On Sat, 2007-03-31 at 00:51 +0200, Florian G. Pflug wrote: > > Simon Riggs wrote: > > > On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: > > >> "Simon Riggs" <simon@2ndquadrant.com> writes: > > >>> 2. pg_stop_backup() should wait until all archive files are safely > > >>> archived before returning > > >> Not sure I agree with that one. If it fails, you can't tell whether the > > >> action is done and it failed while waiting for the archiver, or if you > > >> need to redo it. > > > > > > There's a slight delay between pg_stop_backup() completing and the > > > archiver doing its stuff. Currently if somebody does a -m fast straight > > > after the pg_stop_backup() the backup may be unusable. > > > > > > We need a way to plug that small hole. > > > > > > I suggest that pg_stop_backup() polls once per second until > > > pg_xlog/archive_status/LOG.ready disappears, in which case it ends > > > successfully. If it does this for more than 60 seconds it ends > > > successfully but produces a WARNING. > > > > I fear that ending sucessfully despite having not archived all wals > > will make this feature less worthwile. If a dba knows what he is > > doing, he can code a perfectly safe backup script using 8.2 too. > > He'll just have to check the current wal position after pg_stop_backup(), > > (There is a function for that, right?), and wait until the corresponding > > wal was archived. > > > > In realitly, however, I feare that most people will just create a script > > that does 'echo "select pg_stop_backup | psql"' or something similar. > > If they're a bit more carefull, they will enable ON_ERROR_STOP, and check > > the return value of pgsql. I believe that those are the people who would > > really benefit from a pg_stop_backup() that waits for archiving to complete. > > But they probably won't check for WARNINGs. > > > > Maybe doing it the other way round would be an option? > > pg_stop_backup() could wait for the archiver to complete forever, but > > spit out a warning every 60 seconds or so "WARNING: Still waiting > > for wal archiving of wal ??? to complete". If someone really wants > > a 60-second timeout, he can just use statement_timeout. > > I've just come up against this problem again, so I think it is a must > fix for this release. Other problems exist also, mentioned on separate > threads. > > We have a number of problems surrounding pg_stop_backup/shutdown: > > 1. pg_stop_backup() currently returns before the WAL file containing the > last change is correctly archived. That is a small hole, but one that is > exposed when people write test scripts that immediately shutdown the > database after issuing pg_stop_backup(). It doesn't make much sense to > shutdown immediately after a hot backup, but it should still work > sensibly. > > 2. We've also had problems caused by making the archiver wait until all > WAL files are archived. If there is a backlog for some reason and the > DBA issues a restart (i.e. stop and immediate restart) then making the > archiver loop while it tries (and possibly fails) to archive all files > would cause an outage. Avoiding this is why we do the current > get-out-fast approach. > There are some sub scenarios: > a) there is a backlog of WAL files, but no error has occurred on the > *last* file (we might have just fixed a problem). > b) there is a backlog of WAL files, but an error is causing a retry of > the last file. > > My proposal is for us to record somewhere other than the logs that a > failure to archive has occurred and is being retried. Failure to archive > will be recorded in the archive_status directory as an additional file > called archive_error, which will be deleted in the case of archive > success and created in the case of archive error. This maintains > archiver's lack of attachment to shared memory and general simplicity of > design. > > - pg_stop_backup() will wait until the WAL file that ends the backup is > safely archived, even if a failure to archive occurs. This is a change > to current behaviour, but since it implements the originally *expected* > behaviour IMHO it should be the default. The most straightforward thing is to make pg_stop_backup() wait as described above. If people want it to timeout, they can use a statement_timeout as suggested by Florian. This can be implemented by having the function poll in an infinite loop for archive_status/LOG.done. Also, as Florian suggests, we should have it output a WARNING message every 60 seconds. I'll write a patch now. > - new function: pg_stop_backup_nowait() return immediately without > waiting for archive, the same as the current pg_stop_backup() > > - new function: pg_stop_backup_wait(int seconds) wait until either an > archival fails or the ending WAL file is archived, with a max wait as > specified. wait=0 means wait until archive errors are resolved. So I won't do these. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk