Re: Point in Time Recovery - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Point in Time Recovery |
Date | |
Msg-id | 1089069104.17493.132.camel@stromboli Whole thread Raw |
In response to | Re: Point in Time Recovery (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Point in Time Recovery
|
List | pgsql-hackers |
On Mon, 2004-07-05 at 22:46, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Should we use a different datatype than time_t for the commit timestamp, > > one that offers more fine grained differentiation between checkpoints? > > Pretty much everybody supports gettimeofday() (time_t and separate > integer microseconds); you might as well use that. Note that the actual > resolution is not necessarily microseconds, and it'd still not be > certain that successive commits have distinct timestamps --- so maybe > this refinement would be pointless. You'll still have to design a user > interface that allows selection without the assumption of distinct > timestamps. Well, I agree, though without the desired-for UI now, I think some finer grained mechanism would be good. This means extending the xlog commit record by a couple of bytes...OK, lets live a little. > > - when we stop, keep reading records until EOF, just don't apply them. > > When we write a checkpoint at end of recovery, the unapplied > > transactions are buried alive, never to return. > > - stop where we stop, then force zeros to EOF, so that no possible > > record remains of previous transactions. > > Go with plan B; it's best not to destroy data (what if you chose the > wrong restart point the first time)? > eh? Which way round? The second plan was the one where I would destroy data by overwriting it, thats exactly why I preferred the first. Actually, the files are always copied from archive, so re-recovery is always an available option in the design thats been implemented. No matter... > Actually this now reminds me of a discussion I had with Patrick > Macdonald some time ago. The DB2 practice in this connection is that > you *never* overwrite existing logfile data when recovering. Instead > you start a brand new xlog segment file, Now thats a much better plan...I suppose I just have to rack up the recovery pointer to the first record on the first page of a new xlog file, similar to first plan, but just fast-forwarding rather than forwarding. My only issue was to do with the secondary Checkpoint marker, which is always reset to the place you just restored FROM, when you complete a recovery. That could lead to a situation where you recover, then before next checkpoint, fail and lose last checkpoint marker, then crash recover from previous checkpoint (again), but this time replay the records you were careful to avoid. > which is given a new "branch > number" so it can be distinguished from the future-time xlog segments > that you chose not to apply. I don't recall what the DB2 terminology > was exactly --- not "branch number" I don't think --- but anyway the > idea is that when you restart the database after an incomplete recovery, > you are now in a sort of parallel universe that has its own history > after the branch point (PITR stop point). You need to be able to > distinguish archived log segments of this parallel universe from those > of previous and subsequent incarnations. Thats a good idea, if only because you so easily screw your test data during multiple recovery situations. But if its good during testing, it must be good in production too...since you may well perform recovery...run for a while, then discover that you got it wrong first time, then need to re-recover again. I already added that to my list of gotchas and that would solve it. I was going to say hats off to the Blue-hued ones, when I remembered this little gem from last year http://www.danskebank.com/link/ITreport20030403uk/$file/ITreport20030403uk.pdf > I'm not sure whether Vadim > intended our StartUpID to serve this purpose, but it could perhaps be > used that way, if we reflected it in the WAL file names. > Well, I'm not sure about StartUpId....but certainly the high 2 bytes of LogId looks pretty certain never to be anything but zeros. You have 2.4 x 10^14...which is 9,000 years at 1000 log file/sec We could use the scheme you descibe: add xFFFF to the logid every time you complete an archive recovery...so the log files look like 0001000000000CE3 after youve recovered a load of files that look like 0000000000000CE3 If you used StartUpID directly, you might just run out....but its very unlikely you would ever perform 65000 recovery situations - unless you've run the <expletive> code as often as I have :(. Doing that also means we don't have to work out how to do that with StartUpID. Of course, altering the length and makeup of the xlog files is possible too, but that will cause other stuff to stop working.... [We'll have to give this a no-SciFi name, unless we want to make in-roads into the Dr.Who fanbase :) Don't get them started. Better still, dont give it a name at all.] I'll sleep on that lot. Best regards, Simon Riggs
pgsql-hackers by date: