Re: Why we really need timelines *now* in PITR - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why we really need timelines *now* in PITR
Date
Msg-id 22853.1090252736@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why we really need timelines *now* in PITR  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Why we really need timelines *now* in PITR  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Why we really need timelines *now* in PITR  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> Some further thinking from that base...

> Perhaps timelines should be nest-numbered: (using 0 as a counter also)
> 0 etc is the original branch
> 0.1 is the first recovery off the original branch
> 0.2 is the second recovery off the original branch
> 0.1.1 is the first recovery off the first recovery (so to speak)
> 0.1.2 is the second etc
> That way you don't have the problem of "which is 3?" in the examples
> above. [Would we number a recovery of 1 as 3 or would then next recovery
> off 2 be numbered 3?]

Hmm.  This would have some usefulness as far as documenting "how did we
get here", but unless you knew where/when the timeline splits had
occurred, I don't think it would be super useful.  It'd make more sense
to record the parentage and split time of each timeline in some
human-readable "meta-history" reference file (but where exactly?)

I don't think it does anything to solve our immediate problem, anyhow.
You may know that you are recovering off of branch 0.1, but how do you
know if this is the first, second, or Nth time you have done that?

> Not necessarily the way we would show that as a timeline number. It
> could still be shown as a single hex number representing each nesting
> level as 4 bits...(restricting us to 7 recoveries per timeline...)

Sounds too tight to me :-(

I do see a hole in my original concept now that you mention it.  It'd be
quite possible for timeline 2 *not* to be an ancestor of timeline 3,
that is you might have tried a restore, not liked the result, and
decided to re-restore from someplace else on timeline 1.  That is,
instead of
0001.0014 - 0001.0015 - 0001.0016 - 0001.0017 - ...                      |                      + 0002.0016 - 0002.0017
-...                                  |                                  + 0003.0017 - ...
 

maybe the history is
0001.0014 - 0001.0015 - 0001.0016 - 0001.0017 - ...                      |           |                      |
+0003.0017 - ...                      |                      + 0002.0016 - 0002.0017 - ...
 

where I've had to draw 3 above 2 to avoid having unrelated lines
crossing each other in my diagram.  The problem here is that a crash
recovery in timeline 3 would not realize that it should not use WAL
segment 0002.0016.  So we need a more sophisticated rule than just
numerical comparison of timeline numbers.

I think your idea of nested numbers might fix this, but I'm concerned
about the restrictions of fitting it into 32 bits as you suggest.
Can we think of a less restrictive representation?

> If we go with the renaming recovery.conf when it completes, why not make
> that the record of previous recoveries? Move it to archive_status and
> name it according to the timeline it just created, e.g.
> recovery.done.<timeline>.<timestamp>

There's still the problem of how can you be sure that all the files
created in the past are still in there.  It'd be way too likely for
someone to decide they ought to do a recovery restore by first doing
"rm -rf $PGDATA".  Or they lost the disk entirely and are restoring
their last full backup onto virgin media.

I think there's really no way around the issue: somehow we've got to
keep some meta-history outside the $PGDATA area, if we want to do this
in a clean fashion.  We could perhaps expect the archive area to store
it, but I'm a bit worried about the idea of overwriting the meta-history
file in archive from time to time; it's mighty critical data and you'd
not be happy if a crash corrupted your only copy.  We could archive
meta-history files with successively higher versioned names ... but then
we need an API extension to get back the latest one.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump bug fixing
Next
From: Josh Berkus
Date:
Subject: Re: pg_dump bug fixing