Thread: PITR bad restore possibility?

PITR bad restore possibility?

From
Rod Taylor
Date:
What happens if for reasons of broken tape, disk, etc. you lose some of
your WAL logs which happens to correspond to the middle of the snapshot
backup?

The equivalent would be to:

1) Start the snapshot backup (tar)
2) Stop logging usable WAL logs (say a tape jammed or disk is corrupted)
3) Snapshot portion of the backup successfully completes

Later on you try replay the above snapshot backup while cursing the
partial data loss due to the tape jam or partially corrupt disk.

I believe the replay would have data in the data directory which is
newer and possibly contains partial transactions for data which was not
replayed by the WAL logs.

Would this be a usable database? How about when it eventually uses the
same transaction IDs as have already been used since WAL didn't get to
replay them all?

If not, is there a way to find out the last WAL segment required for the
snapshot backup to be usable?

-- 



Re: PITR bad restore possibility?

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> What happens if for reasons of broken tape, disk, etc. you lose some of
> your WAL logs which happens to correspond to the middle of the snapshot
> backup?

You're screwed ... just like if you lost part of the snapshot itself.

If you're really lucky the missing WAL logs don't contain any data
that's not in the snapshot, but I sure wouldn't trust it.
        regards, tom lane


Re: PITR bad restore possibility?

From
Rod Taylor
Date:
On Wed, 2005-04-27 at 20:14 -0400, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > What happens if for reasons of broken tape, disk, etc. you lose some of
> > your WAL logs which happens to correspond to the middle of the snapshot
> > backup?
> 
> You're screwed ... just like if you lost part of the snapshot itself.
> 
> If you're really lucky the missing WAL logs don't contain any data
> that's not in the snapshot, but I sure wouldn't trust it.

I realize that data would be lost from the point the logs cut off, but
is there any chance of database corruption because the snapshot contains
transaction IDs in tables but possibly not a complete pg_clog or
pg_subtrans directory or vice-versa (pg_clog is complete, but heap files
wouldn't be)?

If WALs cut off at transaction 10, and heaps contain up to transaction
20, WAL makes sure pg_clog is updated to 10 but would a commit from
transaction 11 through 20 make the heap data visible again even though
it wasn't fully restored from WAL?

PostgreSQL doesn't seem to complain if the WAL records cut off before
the PIT that the pg_stop_transaction() was called at during the restore
process, but I'm not entirely sure if it is a corrupt restore (and we
need to go back to an older snapshot and start over playing up until the
WAL records cut off) or whether it's a usable restore ready for
production use.

--