Thread: WAL recovery question - 0000001.history
I've developed and am now testing a new "rolling-WAL" script, and have noticed something a little peculiar with Postgres 8.1.3. Basically I've taken a dump of my live database directory (between pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set up a recovery.conf file (which calls the rolling-WAL script). This script is designed to wait until the log file arrives, or a flag file is set to return code 1 to postgres. However as soon as the recovery starts, the script is waiting for a file called 00000001.history, which will never arrive because it was never generated on the live box since the base backup was taken. So, what I did, was to touch this file (i.e. so it existed, but was empty) - at which point my script recovered it, then PG ignored it and asked for the correct backup files, as shown in the log below. Note I intend to make this script public once testing has been carried out - if anyone is interested in testing, please let me know. The "DEBUG" lines have been generated by my script, not Postgresql. --- START LOG OUTPUT --- 2006-04-28 16:07:33 BST LOG: database system was interrupted at 2006-04-19 10:48:50 BST 2006-04-28 16:07:33 BST LOG: starting archive recovery 2006-04-28 16:07:33 BST LOG: restore_command = "/mndata/scripts/wal_log_recovery.sh /mndata/archive/xlog_transfer/%f %p" DEBUG: Recovering /mndata/archive/xlog_transfer/00000001.history to pg_xlog/RECOVERYHISTORY DEBUG: WAL log /mndata/archive/xlog_transfer/00000001.history does not exist DEBUG: Checking for flag file at /tmp/recoverdb.flag DEBUG: Flag file does not exist DEBUG: 30s to wait before next check --- > I touched the 00000001.history file here < --- DEBUG: Source file /mndata/archive/xlog_transfer/00000001.history exists DEBUG: copy command returned: 0 DEBUG: Returning code 0 to PostgreSQL 2006-04-28 16:10:03 BST LOG: restored log file "00000001.history" from archive DEBUG: Recovering /mndata/archive/xlog_transfer/000000010000000900000009.009FF34C.backup to pg_xlog/RECOVERYHISTORY DEBUG: WAL log /mndata/archive/xlog_transfer/000000010000000900000009.009FF34C.backup does not exist DEBUG: Checking for flag file at /tmp/recoverdb.flag DEBUG: Flag file does not exist DEBUG: 30s to wait before next check --- END LOGOUTPUT --- There was an error during recovering this file: 000000010000000900000009.009FF34C.backup (as for some reason copy returns 0 even when the file has failed to be moved - I'll have to build this check in myself.) So PG carried on looking for 000000010000000900000009 instead. Does Postgres not use the *.history and *.backup files during the recovery - so it's not going to be harmful to the recovery if these files are empty? Thanks Andy
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes: > Basically I've taken a dump of my live database directory (between > pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set > up a recovery.conf file (which calls the rolling-WAL script). This script > is designed to wait until the log file arrives, or a flag file is set to > return code 1 to postgres. > However as soon as the recovery starts, the script is waiting for a file > called 00000001.history, which will never arrive because it was never > generated on the live box since the base backup was taken. This is not a workable approach: your recovery script *will* be asked for files that do not exist, and waiting for them to be supplied is not always the right answer. Supplying an empty file instead is definitely the wrong answer. You might be able to make it work by conditionalizing the behavior on the format of the name being asked for. regards, tom lane
On Apr 28, 2006, at 10:23 AM, Andy Shellam wrote: > I've developed and am now testing a new "rolling-WAL" script, and have > noticed something a little peculiar with Postgres 8.1.3. <snip> > Note I intend to make this script public once testing has been > carried out - > if anyone is interested in testing, please let me know. The > "DEBUG" lines > have been generated by my script, not Postgresql. Actually, it sounds like the script in pgFoundry will already do what you want. Search for 'pitr'. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hi Tom,
Thanks for this. Do you know why Postgres was asking for 00000001.history? Surely it would know by the presence of the backup_label file, and its contents that it needs to go straight for the xxxxxx.backup file, which should exist?
What other files is Postgres likely to be asked for during a PITR restore? Obviously log files that don't exist do need to be waited for - hence the need for this script. If you want PG to come back up without the log it's waiting on, you just set the flag file and it'll come up at the next interval.
So far all I can see that's problematic is this 0000001.history file which I don't know what it is, and it'll never exist because it was never generated on the live.
The other thing I'm building in as well, is a file-size check, so when shipping logs directly from another server (e.g. live to standby), when the script finds a log, it will read the file size, wait 5 seconds, then read it again - if the 2 values differ, the log is still being transferred, so it will wait for another interval and check again before doing anything - preventing itself from pinching the log while it's still being transferred, and causing errors when PG tries to read it.
This will also remove as much human intervention as possible from the whole process.
Andy
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> > Basically I've taken a dump of my live database directory (between
> > pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set
> > up a recovery.conf file (which calls the rolling-WAL script). This script
> > is designed to wait until the log file arrives, or a flag file is set to
> > return code 1 to postgres.
>
> > However as soon as the recovery starts, the script is waiting for a file
> > called 00000001.history, which will never arrive because it was never
> > generated on the live box since the base backup was taken.
>
> This is not a workable approach: your recovery script *will* be asked
> for files that do not exist, and waiting for them to be supplied is not
> always the right answer. Supplying an empty file instead is definitely
> the wrong answer.
>
> You might be able to make it work by conditionalizing the behavior on
> the format of the name being asked for.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
> !DSPAM:14,445239a233692110055341!
>
>
>
Thanks for this. Do you know why Postgres was asking for 00000001.history? Surely it would know by the presence of the backup_label file, and its contents that it needs to go straight for the xxxxxx.backup file, which should exist?
What other files is Postgres likely to be asked for during a PITR restore? Obviously log files that don't exist do need to be waited for - hence the need for this script. If you want PG to come back up without the log it's waiting on, you just set the flag file and it'll come up at the next interval.
So far all I can see that's problematic is this 0000001.history file which I don't know what it is, and it'll never exist because it was never generated on the live.
The other thing I'm building in as well, is a file-size check, so when shipping logs directly from another server (e.g. live to standby), when the script finds a log, it will read the file size, wait 5 seconds, then read it again - if the 2 values differ, the log is still being transferred, so it will wait for another interval and check again before doing anything - preventing itself from pinching the log while it's still being transferred, and causing errors when PG tries to read it.
This will also remove as much human intervention as possible from the whole process.
Andy
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> > Basically I've taken a dump of my live database directory (between
> > pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set
> > up a recovery.conf file (which calls the rolling-WAL script). This script
> > is designed to wait until the log file arrives, or a flag file is set to
> > return code 1 to postgres.
>
> > However as soon as the recovery starts, the script is waiting for a file
> > called 00000001.history, which will never arrive because it was never
> > generated on the live box since the base backup was taken.
>
> This is not a workable approach: your recovery script *will* be asked
> for files that do not exist, and waiting for them to be supplied is not
> always the right answer. Supplying an empty file instead is definitely
> the wrong answer.
>
> You might be able to make it work by conditionalizing the behavior on
> the format of the name being asked for.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
> !DSPAM:14,445239a233692110055341!
>
>
>
andy@andycc.net writes: > Hi Tom,Thanks for this. Do you know why Postgres was asking for > 00000001.history? Surely it would know by the presence of the > backup_label file, and its contents that it needs to go straight for the > xxxxxx.backup file, which should exist? It still needs to worry about timelines. Feel free to study the logic in xlog.c ... regards, tom lane