Thread: WAL recovery question - 0000001.history

WAL recovery question - 0000001.history

From
"Andy Shellam"
Date:
I've developed and am now testing a new "rolling-WAL" script, and have
noticed something a little peculiar with Postgres 8.1.3.

Basically I've taken a dump of my live database directory (between
pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set
up a recovery.conf file (which calls the rolling-WAL script).  This script
is designed to wait until the log file arrives, or a flag file is set to
return code 1 to postgres.

However as soon as the recovery starts, the script is waiting for a file
called 00000001.history, which will never arrive because it was never
generated on the live box since the base backup was taken.

So, what I did, was to touch this file (i.e. so it existed, but was empty) -
at which point my script recovered it, then PG ignored it and asked for the
correct backup files, as shown in the log below.

Note I intend to make this script public once testing has been carried out -
if anyone is interested in testing, please let me know.  The "DEBUG" lines
have been generated by my script, not Postgresql.

--- START LOG OUTPUT ---

2006-04-28 16:07:33 BST LOG:  database system was interrupted at 2006-04-19
10:48:50 BST
2006-04-28 16:07:33 BST LOG:  starting archive recovery
2006-04-28 16:07:33 BST LOG:  restore_command =
"/mndata/scripts/wal_log_recovery.sh /mndata/archive/xlog_transfer/%f %p"
DEBUG: Recovering /mndata/archive/xlog_transfer/00000001.history to
pg_xlog/RECOVERYHISTORY
DEBUG: WAL log /mndata/archive/xlog_transfer/00000001.history does not exist
DEBUG: Checking for flag file at /tmp/recoverdb.flag
DEBUG: Flag file does not exist
DEBUG: 30s to wait before next check

--- > I touched the 00000001.history file here < ---

DEBUG: Source file /mndata/archive/xlog_transfer/00000001.history exists
DEBUG: copy command returned: 0
DEBUG: Returning code 0 to PostgreSQL
2006-04-28 16:10:03 BST LOG:  restored log file "00000001.history" from
archive
DEBUG: Recovering
/mndata/archive/xlog_transfer/000000010000000900000009.009FF34C.backup to
pg_xlog/RECOVERYHISTORY
DEBUG: WAL log
/mndata/archive/xlog_transfer/000000010000000900000009.009FF34C.backup does
not exist
DEBUG: Checking for flag file at /tmp/recoverdb.flag
DEBUG: Flag file does not exist
DEBUG: 30s to wait before next check

--- END LOGOUTPUT ---

There was an error during recovering this file:
000000010000000900000009.009FF34C.backup (as for some reason copy returns 0
even when the file has failed to be moved - I'll have to build this check in
myself.)  So PG carried on looking for 000000010000000900000009 instead.

Does Postgres not use the *.history and *.backup files during the recovery -
so it's not going to be harmful to the recovery if these files are empty?

Thanks

Andy





Re: WAL recovery question - 0000001.history

From
Tom Lane
Date:
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> Basically I've taken a dump of my live database directory (between
> pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set
> up a recovery.conf file (which calls the rolling-WAL script).  This script
> is designed to wait until the log file arrives, or a flag file is set to
> return code 1 to postgres.

> However as soon as the recovery starts, the script is waiting for a file
> called 00000001.history, which will never arrive because it was never
> generated on the live box since the base backup was taken.

This is not a workable approach: your recovery script *will* be asked
for files that do not exist, and waiting for them to be supplied is not
always the right answer.  Supplying an empty file instead is definitely
the wrong answer.

You might be able to make it work by conditionalizing the behavior on
the format of the name being asked for.

            regards, tom lane

Re: WAL recovery question - 0000001.history

From
Jim Nasby
Date:
On Apr 28, 2006, at 10:23 AM, Andy Shellam wrote:

> I've developed and am now testing a new "rolling-WAL" script, and have
> noticed something a little peculiar with Postgres 8.1.3.

<snip>

> Note I intend to make this script public once testing has been
> carried out -
> if anyone is interested in testing, please let me know.  The
> "DEBUG" lines
> have been generated by my script, not Postgresql.

Actually, it sounds like the script in pgFoundry will already do what
you want. Search for 'pitr'.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: WAL recovery question - 0000001.history

From
andy@andycc.net
Date:
Hi Tom,

Thanks for this.  Do you know why Postgres was asking for 00000001.history?  Surely it would know by the presence of the backup_label file, and its contents that it needs to go straight for the xxxxxx.backup file, which should exist?

What other files is Postgres likely to be asked for during a PITR restore?  Obviously log files that don't exist do need to be waited for - hence the need for this script.  If you want PG to come back up without the log it's waiting on, you just set the flag file and it'll come up at the next interval.
So far all I can see that's problematic is this 0000001.history file which I don't know what it is, and it'll never exist because it was never generated on the live.

The other thing I'm building in as well, is a file-size check, so when shipping logs directly from another server (e.g. live to standby), when the script finds a log, it will read the file size, wait 5 seconds, then read it again - if the 2 values differ, the log is still being transferred, so it will wait for another interval and check again before doing anything - preventing itself from pinching the log while it's still being transferred, and causing errors when PG tries to read it.

This will also remove as much human intervention as possible from the whole process.

Andy

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> > Basically I've taken a dump of my live database directory (between
> > pg_start_backup and pg_stop_backup) calls - shipped this to my standby, set
> > up a recovery.conf file (which calls the rolling-WAL script). This script
> > is designed to wait until the log file arrives, or a flag file is set to
> > return code 1 to postgres.
>
> > However as soon as the recovery starts, the script is waiting for a file
> > called 00000001.history, which will never arrive because it was never
> > generated on the live box since the base backup was taken.
>
> This is not a workable approach: your recovery script *will* be asked
> for files that do not exist, and waiting for them to be supplied is not
> always the right answer. Supplying an empty file instead is definitely
> the wrong answer.
>
> You might be able to make it work by conditionalizing the behavior on
> the format of the name being asked for.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
> !DSPAM:14,445239a233692110055341!
>
>
>

Re: WAL recovery question - 0000001.history

From
Tom Lane
Date:
andy@andycc.net writes:
> Hi Tom,Thanks for this.  Do you know why Postgres was asking for
> 00000001.history?  Surely it would know by the presence of the
> backup_label file, and its contents that it needs to go straight for the
> xxxxxx.backup file, which should exist?

It still needs to worry about timelines.  Feel free to study the logic
in xlog.c ...

            regards, tom lane