Using PG 8.2, I have a database in recovery mode using pg_standby to handle the WAL restores.
Is it allowable to have a backup database in recovery mode, then stop recovery (in this case, by putting the trigger file in place to stop pg_standby), check out that the backup db appears up to date, stop the now active backup db, and then restart it in recover mode again to have it resume its backup role?
I have had some success doing this, with the restart in recovery showing:
LOG: starting archive recovery LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t ~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>> ~/pg_standby.log" LOG: restored log file "000000010000000500000018" from archive LOG: invalid xl_info in primary checkpoint record LOG: using previous checkpoint record at 5/18000020 LOG: redo record is at 5/18000020; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/1535389; next OID: 53990 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 5/18000068
But there are times when I do this that it cannot. Is this because the steps are an issue (after all, I did stop recovery and go active briefly, though I didn't update the db during that time, just did \d and select queries to see that DDL and row data were updated on the backup), or is it related to not keeping enough WAL files around (pg_standby -k 20 was chosen, but it's not clear how to select this value, and it sounds like 8.3 gets rid of that issue entirely) to find the 'secondary checkpoint record'.
Here's the sort of error I get when it doesn't allow me to restart:
LOG: database system was shut down at 2008-01-11 11:40:05 PST LOG: starting archive recovery LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t ~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>> ~/pg_standby.log" LOG: restored log file "00000001000000050000001D" from archive LOG: invalid record length at 5/1D000068 LOG: invalid primary checkpoint record LOG: restored log file "00000001000000050000001D" from archive LOG: invalid resource manager ID in secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 9219) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down