Thread: Having trouble restoring our backups

Having trouble restoring our backups

From
Bryan Murphy
Date:
Hey guys, I'm having difficulty restoring some of our backups.  Luckily, I'm only trying to do this to bring up a copy of our database for testing purposes, but this still has me freaked out because it means we currently have no valid backups and are only running with a single warm spare.

Our primary database is on a RAID-10 that can't take snapshots and is very overworked, so we ship our wal files to a warm standby server.  Every day or two I log in to the warm standby and run the following commands:

1. xfs_freeze -f /srv   (this is where the entire postgres tree is mounted, no funny business with symlinks)
2. * take file system snapshot, wait about 30 seconds for snapshot to start running *
3. xfs_freeze -u /srv

I don't exactly know how the snapshotting works (it's an Amazon EBS volume), so I don't know if I should wait until the snapshotting is 100% complete before I unfreeze the volume.  This whole process can easily take 30 minutes to an hour, so I am also concerned that if I wait that long to unfreeze the volume I may cause an excessive backlog of wal files that are not getting applied to the warm spare.

Now, when I try to restore one of these snapshots, I do the following:

1. create new share from snapshot
2. mount new share in new Linux instance
3. start postgres, verify that it's running and is still in recovery mode
4. touch my go live file and bring the database up

I've done this successfully in the past.  Today, however, I'm running into this problem when I try to run some queries:

ERROR:  could not access status of transaction 237546265
DETAIL:  Could not open file "pg_clog/00E2": No such file or directory.

I tried creating the missing files last night using dd, and I was able to get the database to a point where I was able to run queries against it, however it was missing data that should have been there.  I tried again this morning with a different snapshot and I've run into the same problem again.

What am I doing wrong?  FYI, we're running 8.3.7.

Thanks,
Bryan


Re: Having trouble restoring our backups

From
Alan Hodgson
Date:
On Friday 12 June 2009, Bryan Murphy <bmurphy1976@gmail.com> wrote:
> What am I doing wrong?  FYI, we're running 8.3.7.

See the documentation on PITR backups for how to do this correctly.

--
WARNING:  Do not look into laser with remaining eye.

Re: Having trouble restoring our backups

From
Bryan Murphy
Date:
On Fri, Jun 12, 2009 at 10:48 AM, Alan Hodgson <ahodgson@simkin.ca> wrote:
On Friday 12 June 2009, Bryan Murphy <bmurphy1976@gmail.com> wrote:
> What am I doing wrong?  FYI, we're running 8.3.7.

See the documentation on PITR backups for how to do this correctly.

I've read through the PITR documentation many times.  I do not see anything that sheds light on what I'm doing wrong, and I've restored older backups successfully many times in the past few months using this technique.  I have no explanation for why all of a sudden my last few backups are not restoring properly and we've not changed anything on our database setup recently.

I'm currently creating a full backup of our primary database and will build a second warm spare with that, but the additional pressure this puts on our system is not acceptable as a long term backup solution.

Bryan

Re: Having trouble restoring our backups

From
Bryan Murphy
Date:
On Fri, Jun 12, 2009 at 11:08 AM, Bryan Murphy <bmurphy1976@gmail.com> wrote:
I've read through the PITR documentation many times.  I do not see anything that sheds light on what I'm doing wrong, and I've restored older backups successfully many times in the past few months using this technique.  I have no explanation for why all of a sudden my last few backups are not restoring properly and we've not changed anything on our database setup recently.

I'm currently creating a full backup of our primary database and will build a second warm spare with that, but the additional pressure this puts on our system is not acceptable as a long term backup solution.

FYI, for future reference for anybody else who runs into this problem, it appears we somehow lost the pg_clog files during the last time we took a full snapshot of our primary database.  Our PITR spare was happily recovering wal files, but when I tried to bring it up it was missing the pg_clogs and it's literally been weeks since I last tried to do this (stupid on my part).

We appear to have repaired our PITR based backup by copying the missing pg_clog files from our production database which thankfully still had them.  I do not know how they got dropped from the last snapshot we took, but we'll be looking into our hot-spare building process to see what we can do to prevent this from happening again.

Thanks,
Bryan