Re: Point in Time Recovery - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Point in Time Recovery
Date
Msg-id 1089069104.17493.132.camel@stromboli
Whole thread Raw
In response to Re: Point in Time Recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Point in Time Recovery
List pgsql-hackers
On Mon, 2004-07-05 at 22:46, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Should we use a different datatype than time_t for the commit timestamp,
> > one that offers more fine grained differentiation between checkpoints?
> 
> Pretty much everybody supports gettimeofday() (time_t and separate
> integer microseconds); you might as well use that.  Note that the actual
> resolution is not necessarily microseconds, and it'd still not be
> certain that successive commits have distinct timestamps --- so maybe
> this refinement would be pointless.  You'll still have to design a user
> interface that allows selection without the assumption of distinct
> timestamps.

Well, I agree, though without the desired-for UI now, I think some finer
grained mechanism would be good. This means extending the xlog commit
record by a couple of bytes...OK, lets live a little.

> > - when we stop, keep reading records until EOF, just don't apply them.
> > When we write a checkpoint at end of recovery, the unapplied
> > transactions are buried alive, never to return.
> > - stop where we stop, then force zeros to EOF, so that no possible
> > record remains of previous transactions.
> 
> Go with plan B; it's best not to destroy data (what if you chose the
> wrong restart point the first time)?
> 

eh? Which way round? The second plan was the one where I would destroy
data by overwriting it, thats exactly why I preferred the first.

Actually, the files are always copied from archive, so re-recovery is
always an available option in the design thats been implemented.

No matter...

> Actually this now reminds me of a discussion I had with Patrick
> Macdonald some time ago.  The DB2 practice in this connection is that
> you *never* overwrite existing logfile data when recovering.  Instead
> you start a brand new xlog segment file, 

Now thats a much better plan...I suppose I just have to rack up the
recovery pointer to the first record on the first page of a new xlog
file, similar to first plan, but just fast-forwarding rather than
forwarding.

My only issue was to do with the secondary Checkpoint marker, which is
always reset to the place you just restored FROM, when you complete a
recovery. That could lead to a situation where you recover, then before
next checkpoint, fail and lose last checkpoint marker, then crash
recover from previous checkpoint (again), but this time replay the
records you were careful to avoid.

> which is given a new "branch
> number" so it can be distinguished from the future-time xlog segments
> that you chose not to apply.  I don't recall what the DB2 terminology
> was exactly --- not "branch number" I don't think --- but anyway the
> idea is that when you restart the database after an incomplete recovery,
> you are now in a sort of parallel universe that has its own history
> after the branch point (PITR stop point).  You need to be able to
> distinguish archived log segments of this parallel universe from those
> of previous and subsequent incarnations.  

Thats a good idea, if only because you so easily screw your test data
during multiple recovery situations. But if its good during testing, it
must be good in production too...since you may well perform
recovery...run for a while, then discover that you got it wrong first
time, then need to re-recover again. I already added that to my list of
gotchas and that would solve it.

I was going to say hats off to the Blue-hued ones, when I remembered
this little gem from last year
http://www.danskebank.com/link/ITreport20030403uk/$file/ITreport20030403uk.pdf

> I'm not sure whether Vadim
> intended our StartUpID to serve this purpose, but it could perhaps be
> used that way, if we reflected it in the WAL file names.
> 

Well, I'm not sure about StartUpId....but certainly the high 2 bytes of
LogId looks pretty certain never to be anything but zeros. You have 2.4
x 10^14...which is 9,000 years at 1000 log file/sec
We could use the scheme you descibe:
add xFFFF to the logid every time you complete an archive recovery...so
the log files look like 0001000000000CE3 after youve recovered a load of
files that look like 0000000000000CE3

If you used StartUpID directly, you might just run out....but its very
unlikely you would ever perform 65000 recovery situations - unless
you've run the <expletive> code as often as I have :(.

Doing that also means we don't have to work out how to do that with
StartUpID. Of course, altering the length and makeup of the xlog files
is possible too, but that will cause other stuff to stop working....

[We'll have to give this a no-SciFi name, unless we want to make
in-roads into the Dr.Who fanbase :) Don't get them started. Better
still, dont give it a name at all.]

I'll sleep on that lot.

Best regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Security...
Next
From: Simon Riggs
Date:
Subject: Re: Recovery Features