Thread: PITR and WAL archiving
I have been ramming my head against the wall for the past two evenings, trying to get PITR working and it's gotten to the point where I need to ask someone for help. I'm basically trying to follow the instructions in the manual in chapter 23.3. Here's how I'm starting out: =================== Contents of 'data': PG_VERSION base global pg_clog pg_hba.conf pg_ident.conf pg_multixact pg_subtrans pg_tblspc pg_twophase pg_xlog postgresql.conf postmaster.opts data/pg_xlog/archive_status is empty. I have the following line in postgresql.conf: archive_command = 'cp -i %p /mnt/server/archive/wal/%f < /dev/null' /mnt/server/archive/wal is empty. Here's my procedure: =============== 1. pg_ctl start -l logfile 2. SELECT pg_start_backup('label'); 3. tar -zcf /mnt/server/archive/file/backup.tar.gz . (the contents of 'data') 4. SELECT pg_stop_backup(); 5. pg_ctl stop 6. I now delete the contents of 'data' (making a copy before hand) 7. In 'data': tar xzvf /mnt/server/archive/file/backup.tar.gz 8. echo \"restore_command = 'cp /mnt/server/archive/wal/%f %p'\" > recovery.conf 9. pg_ctl start -l logfile And here are the logs: ================ LOG: database system was shut down at 2007-03-13 23:07:33 CET LOG: checkpoint record is at 0/1E000020 LOG: redo record is at 0/1E000020; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 0/161203; next OID: 17156 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system is ready LOG: received smart shutdown request LOG: shutting down LOG: database system is shut down LOG: database system was shut down at 2007-03-13 23:09:21 CET LOG: checkpoint record is at 0/1E000068 LOG: redo record is at 0/1E000068; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 0/161203; next OID: 17156 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system is ready LOG: database system was interrupted at 2007-03-13 23:11:21 CET LOG: starting archive recovery LOG: restore_command = "cp /mnt/server/archive/wal/%f %p" cp: /mnt/server/archive/wal/00000001.history: No such file or directory LOG: restored log file "00000001000000000000001E.000000B0.backup" from archive LOG: restored log file "00000001000000000000001E" from archive LOG: checkpoint record is at 0/1E0000B0 LOG: redo record is at 0/1E0000B0; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/161205; next OID: 17156 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 0/1E0000F8 cp: /mnt/server/archive/wal/00000001000000000000001F: No such file or directory LOG: could not open file "pg_xlog/00000001000000000000001F" (log file 0, segment 31): No such file or directory LOG: redo done at 0/1E0000F8 LOG: restored log file "00000001000000000000001E" from archive LOG: archive recovery complete LOG: database system is ready LOG: received smart shutdown request LOG: shutting down LOG: database system is shut down LOG: database system was shut down at 2007-03-13 23:17:32 CET LOG: checkpoint record is at 0/1F000068 LOG: redo record is at 0/1F000068; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 0/161207; next OID: 17156 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system is ready overwrite /mnt/server/archive/wal/00000001000000000000001E? (y/n [n]) not overwritten LOG: archive command "cp -i pg_xlog/00000001000000000000001E /mnt/ server/archive/wal/00000001000000000000001E < /dev/null" failed: return code 256 Where oh where might I be going wrong? I'd be grateful for any suggestions. Regards Adam
Adam Groves wrote: > I have been ramming my head against the wall for the past two > evenings, trying to get PITR working and it's gotten to the point > where I need to ask someone for help. > > I'm basically trying to follow the instructions in the manual in > chapter 23.3. Which postgres version are you using? > 1. pg_ctl start -l logfile > 2. SELECT pg_start_backup('label'); > 3. tar -zcf /mnt/server/archive/file/backup.tar.gz . (the contents of > 'data') > 4. SELECT pg_stop_backup(); > 5. pg_ctl stop > 6. I now delete the contents of 'data' (making a copy before hand) > 7. In 'data': tar xzvf /mnt/server/archive/file/backup.tar.gz > 8. echo \"restore_command = 'cp /mnt/server/archive/wal/%f %p'\" > > recovery.conf > 9. pg_ctl start -l logfile Are you doing this all in one row, without any other queries (especially updates) hitting the database? The reason that I ask is that postgres 8.1 will *only* archive a wal segment (a wal file) if it's completly filled. Since every WAL segment in 8.1 is *exactly* 16MB large, archive_command will only get called after 16MB worth of wal traffic has happened. In 8.1, pg_stop_backup() won't trigger archive_command either - it can't, because postgres 8.1 can't close a wal segment early - it *has* to write 16MB first. Therefore, the part of the wal that is bring your filesystem backup into a consistent state lives in the currently open wal after pg_stop_backup is called. Now, you remove your datadir, and restore your filesystem backup. Postgres realizes that it needs the wal segement that was current when pg_start_backup() was called, and requests this from restore_command. But since this segement was the current segment up to the point where you removed your datadir, it got never archived :-(. Postgres 8.2 improves the situation. It can now switch wal segments early, and will do so if you call pg_stop_backup(). greetings, Florian Pflug