Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
Date
Msg-id 515FDBBF.8040207@vmware.com
Whole thread Raw
In response to Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog  (Jeff Janes <jeff.janes@gmail.com>)
Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog  (Jeff Bohmer <bohmer@visionlink.org>)
List pgsql-bugs
On 06.04.2013 01:02, Jeff Janes wrote:
> On Fri, Apr 5, 2013 at 12:27 PM,<bohmer@visionlink.org>  wrote:
>> I use a custom base backup script to call pg_start/stop_backup() and make
>> the backup with rsync.
>>
>> The restore_command in recovery.conf is never called by PG 9.2.4 during
>> startup. I confirmed this by adding a "touch /tmp/restore_command.`date
>> +%H:%M:%S`" line at the beginning of the shell script I use for my
>> restore_command. No such files are created when starting PG 9.2.4.
>>
>> After downgrading back to 9.2.3, archive recovery works using the very same
>> base backup, recovery.conf file, and restore_command. The log indicates
>> that
>> PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
>> pg_xlog:
>
> I can reproduce the behavior you report only if I remove the "backup_label"
> file from the restored data directory before I begin recovery.  Of course,
> doing that renders the backup invalid, as without it recovery is very
> likely to begin from the wrong WAL recovery location.

Yeah, if you use pg_start/stop_backup(), there definitely should be a
backup_label present.

But there is a point here, if you use an atomic filesystem snapshot
instead of pg_start/stop_backup(), or just a plain copy of the data
directory while the system is shut down. The problem in that case is
that if pg_xlog is empty, we have no idea how far we need to recover
until the system is consistent. Actually, if the system was shut down,
then the system is consistent immediately and we could allow that, but
the problem still remains for an online backup using an atomic
filesystem snapshot.

I don't think there's much we can do about that case. We could start up
and recover all the WAL from the archive before we declare consistency,
but that gets pretty complicated, and it would still not work if you
tried to do that in a standby that uses streaming replication without a
restore_command.

So, I think what we need to do is to update the documentation to make it
clear that you must not zap pg_xlog if you take a backup without
pg_start/stop_backup(). The documentation that talks about filesystem
snapshots and offline backups doesn't actually say that you can zap
pg_xlog - that is only mentioned in the section on
pg_start/stop_backup(). But perhaps that could be made more explicit.

>> Or, must I now include pg_xlog files when taking base backups with 9.2.4,
>> contrary to the documentation?
>
> You do not need to include pg_xlog, but you do need to include
> backup_label.  And you always did need to include it--if you were not
> including it in the past, then you were playing with fire and is only due
> to luck that your database survived.

Incidentally, I bumped into another custom backup script just a few
weeks back that also excluded backup_label. I don't know what the author
was thinking when he wrote that, but it seems to be a surprisingly
common mistake. Maybe it's the "label" in the filename that makes people
think it's not important. Perhaps we should improve the documentation to
make it more explicit that backup_label must be included in the backup.
The docs already say that, though, so I suspect that people making this
mistake have not read the docs very carefully anyway.

Perhaps a comment in the beginning of backup_label would help:

# NOTE: This file MUST be included in the backup. Otherwise, the backup
# is inconsistent, and restoring it may result in a corrupt database.

Jeff B., assuming that you excluded backup_label from the backup for
some reason, do you have any thoughts on what would've helped you to
avoid that mistake? Would a comment like above have helped - did you
look inside backup_label at any point?

- Heikki

pgsql-bugs by date:

Previous
From: John R Pierce
Date:
Subject: Re:
Next
From: Jeff Janes
Date:
Subject: Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog