Re: [HACKERS] Point in Time Recovery - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [HACKERS] Point in Time Recovery
Date
Msg-id 200407192024.i6JKOrO16297@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] Point in Time Recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Point in Time Recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> It should certainly go to /share as a .sample file.  I was thinking that
> >> initdb should perhaps copy it into $PGDATA (still as .sample, not as
> >> .conf!) so it'd be right there when you need it.
>
> > I think /share is best.
>
> Okay, we agree on that part at least; I'll take care of it.  If anyone
> wants to argue for further copying during initdb, that can be added
> later.
>
> > I am confused.  Aren't we always doing a restore from a backup?
>
> No.  This code serves two purposes: recovery from archived WAL and
> point-in-time recovery.  You might want to do a PITR run at a time
> where not all your WAL segments have been pushed to archive.  Indeed
> the latest one can never be so pushed, since it's unfinished.  Suppose
> you are trying to do PITR recovery to a time just a few minutes ago
> that is still in the latest WAL segment --- there is simply not any
> legal way to have that come from the archive.
>
> So we can't simply zero out pg_xlog at the start of a PITR run, even
> if there weren't a don't-destroy-data argument against it.

If we had some code that checks pg_xlog on recovery startup, it could
rename each pg_xlog file and then recover the file from the archive.  If
it doesn't exist or is truncated, discard it.  If it is the right size,
we need to check to see which one has a WAL eof-of-segment marker (we
have on of those, right?).  This would seem to catch all the cases:

    o  file brought back by tar, but complete file in archive
    o  archive in process of writing during crash
    o  partially full file in pg_xlog

What it doesn't cover are cases where tar gets a partial copy of a
pg_xlog file but the file never made it to archive yet, and a new
pg_xlog file was created and we get some of that file too.  In fact, the
backup could get holes in the pg_xlog file where the backup has zeros
but the real file had data added to it after the zeros:

in tar    XXXXX  00000 XXXXX

real    XXXXX  XXXXX XXXXX

This could happen when file has this:

    XXXXX  00000 00000

backup reads this:

    XXXXX  00000

database writes this:

    XXXXX  XXXXX XXXXX

backup reads the remainder of the file:

    XXXXX  00000 XXXXX

In this case the end-of-segment marker doesn't even help us, and their
might not be an archive copy of this because it didn't happen yet.

I think I see a solution. We are going to create a file during backup so
we know the wal offsets and xids.  If we see that file, we know either
we have a restore of a backup or they currently running a backup.  If we
tell them not to restore while a backup is running (seems pretty
obvious) we can then delete pg_xlog when the backup wal offset file
exists.  In other cases, we know the WAL files are valid to use.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Point in Time Recovery
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] Point in Time Recovery