Dear Sir/Madam,
I am not sure if is it proper to post my problem here. If not, please forgive my ignorance and tell me where should I post to.
Recently, I am applying point-in-time recovery with Debian and postgres 8.3 (due to some reason, I have no chance to upgrade) but I encountered some problems. I followed the instruction from the official document step by step.
1. First, I modified the postgres.conf to enable WAL arching and restart the postgres.
2. Then I simply tar the whole data in the cluster data directory, ${PG_DATA} to be the base backup. During this step, I called pg_start_backup('label') and pg_stop_backup() before and after the tar procedure separately.
3. After that, I inserted some data into the database.
4. Next, I simulated that the database is corrupted and need to perform recover.
4.0. stop postgres
4.1. moved the WALs from ${PG_DATA}/pg_xlog to another directory
4.2. untared the base backup and moved the data to ${PG_DATA} (overwrite it)
4.3. created recovery.conf, following is my configuration: (Note, I stored the WALs to a remote host)
restore_command = 'rsync -a host_user@host_ip:/path/to/remote/host/wal/%f %p'
recovery_target_time = 'YYYY-mm-dd HH:MM:SS'
recovery_target_timeline = 'value'
4.4. restarted postgres
At first, everything was fine. I could perform recover successfully. I could see from log that postgres did restore the WALs and I could see the data which i inserted in step 3 in database, too. But when I performed recover repeatedly (that is I repeatedly performed from step 4.0 to step 4.4). I got very high possibility that postgres could fail to recover. Here is the error message:
2015-06-18 20:22:02 GMT+8 LOG: restored log file "00000001000000000000002E.00000020.backup" from archive
2015-06-18 20:22:03 GMT+8 LOG: restored log file "00000001000000000000002E" from archive
2015-06-18 20:22:03 GMT+8 LOG: unexpected pageaddr 0/2A000000 in log file 0, segment 46, offset 0
2015-06-18 20:22:03 GMT+8 LOG: invalid checkpoint record
2015-06-18 20:22:03 GMT+8 FATAL: could not locate required checkpoint record
2015-06-18 20:22:03 GMT+8 HINT: If you are not restoring from a backup, try removing the file "/home/genie/db_mount_point/backup_label".
2015-06-18 20:22:03 GMT+8 LOG: startup process (PID 658) exited with exit code 1
2015-06-18 20:22:03 GMT+8 LOG: aborting startup due to startup process failure
I do not know what caused the problem exactly. Is the problem happened because I performed recover repeatedly? Please give me some suggestion.
Yours faithfully