Thread: PITR Archive Recovery plus WIP PITR

PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
A number of you have pointed out that the last patch had a number of
problems, thank you all. This patchset supercedes all previous versions,
and is the only one that will work against current CVS tip.

cd pgsql/src
patch -p0 < pitr_v5_0.patch

then place

pgarch.c in src/backend/postmaster
pgarch.h in src/include

Read the README - its long and you wont have a clue without it

...remember, you HAVE TO run this on a cluster created by initdb within
the last 2-3 days...

When you perform a recovery, you can use the example recovery.conf
provided here. (Not all of the options work yet...) Just place this file
in PGDATA then crank up the postmaster any way you choose.

Known issues:
- CREATE DATABASE will not be recovered (at present)...create a new
database BEFORE you take a full physical backup
- error handling in the archiver has some issues, and requires a few
improvements...
- PITR support is partially complete in this patch - its a Work in
Progress (WIP), but doesn't interfere with other operations - I will
continue to work on this

...but this works - so please don't be put off from giving it a try - it
will take a while to get used to the concepts behind it all.

Earlier versions
- much of the code for recovery is in xlog.c - refactoring proved
difficult and a big timewaster when the main functions aren't all there
yet
- error messages streamlined
- variable naming more consistent than earlier versions

Best Regards, Simon Riggs

Attachment

Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
On Fri, 2004-07-09 at 12:53, Klaus Naumann wrote:
> archive_program is provided with a string which contains the target directory.
> That doesn't really make sense.

archive_dest is used for both archive and restore, thats why its set as
a separate parameter.

Thats the rationale...lets see what others think

> First of all it introduces the problem you
> mentioned in the README file (if the directory doesn't exist you loose
> xlogs).

Your example quoted later is the answer....
use
    archive_dest = '/mnt/pgarch/'
rather than
    archive_dest = '/mnt/pgarch'
which is ambiguous...

> I thought about checking if this is a dir within the code. But
> this would make things too unfelxible.

Yes, otherwise the check would be there

> Second, we could make the user responsible of what he's doing by not
> giving him any target.
>

Remember, the user is specifying the archive_dest also, so the user is
completely responsible for how archiving actually occurs.

> Like you could then do things like:
>
> archive_program = 'gzip -d %s | tar rf /dev/nst0 - '

archive_program = 'gzip -d %s | tar rf %s - '

would be how I would use it in the example you give

>
> Which adds the file to a tar archive on his tape.
> If he wants to archive it on disk, let him do it this way:
>
> archive_program = 'cp %s /mnt/pgarch/'

archive_program = 'cp %s %s'

would be the way to specify that...

Thank you very much for feedback and your other contributions,

Best regards, Simon Riggs


Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
Following a suggestion and patch from Klaus Naumann, the recovery.conf
file can now accept comments....

No patch supplied at present (anoncvs is down), but here is the
annotated recovery.conf.sample

Best Regards, Simon Riggs

Attachment

Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
New release of patch, at v5_1 ... for serious testing
what's in
- Point in Time Recovery now works....please check carefully
- additional options in recovery.conf
(including code contributed to PITR from Klaus Naumann)

what's not (yet)
- Timelines...though I think they are useful, they may not be critical
- handling of local/UTC times (the variable is there...)

The number of permutations is increasing, and available time is
decreasing....not a full retest, OK.


On Thu, 2004-07-08 at 19:11, Simon Riggs wrote:

> cd pgsql/src
> patch -p0 < pitr_v5_0.patch
>
> then place
>
> pgarch.c in src/backend/postmaster
> pgarch.h in src/include
>
> Read the README - its long and you wont have a clue without it
>
> ...remember, you HAVE TO run this on a cluster created by initdb within
> the last 2-3 days...
>
> When you perform a recovery, you can use the example recovery.conf
> provided here. (Not all of the options work yet...) Just place this file
> in PGDATA then crank up the postmaster any way you choose.
>
> Known issues:
> - CREATE DATABASE will not be recovered (at present)...create a new
> database BEFORE you take a full physical backup
> - error handling in the archiver has some issues, and requires a few
> improvements...



> ...but this works - so please don't be put off from giving it a try - it
> will take a while to get used to the concepts behind it all.
>
> Earlier versions
> - much of the code for recovery is in xlog.c - refactoring proved
> difficult and a big timewaster when the main functions aren't all there
> yet
> - error messages streamlined
> - variable naming more consistent than earlier versions
>
> Best Regards, Simon Riggs

Attachment

Re: PITR Archive Recovery plus WIP PITR

From
Bruce Momjian
Date:
Simon Riggs wrote:
> New release of patch, at v5_1 ... for serious testing
> what's in
> - Point in Time Recovery now works....please check carefully
> - additional options in recovery.conf
> (including code contributed to PITR from Klaus Naumann)
>
> what's not (yet)
> - Timelines...though I think they are useful, they may not be critical

I am not fond of the timeline idea, especially for 7.5.  Let's get usage
cases submitted first.  I can imagine timelines as causing significant
confusion during restore, which is the last thing we want to do.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
On Tue, 2004-07-13 at 23:58, Bruce Momjian wrote:
> Simon Riggs wrote:
> > New release of patch, at v5_1 ... for serious testing
> > what's in
> > - Point in Time Recovery now works....please check carefully
> > - additional options in recovery.conf
> > (including code contributed to PITR from Klaus Naumann)
> >
> > what's not (yet)
> > - Timelines...though I think they are useful, they may not be critical
>
> I am not fond of the timeline idea, especially for 7.5.  Let's get usage
> cases submitted first.  I can imagine timelines as causing significant
> confusion during restore, which is the last thing we want to do.

Well, I really want to finish this, so I do agree.

Exhaustion is setting in....I need other eyes to test and fix the bugs.

Best Regards, Simon Riggs


Re: PITR Archive Recovery plus WIP PITR

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Simon Riggs wrote:
> what's not (yet)
>> - Timelines...though I think they are useful, they may not be critical

> I am not fond of the timeline idea, especially for 7.5.  Let's get usage
> cases submitted first.  I can imagine timelines as causing significant
> confusion during restore, which is the last thing we want to do.

I think that judgment is exactly backward.  *Not* having timelines is
what will cause serious and possibly fatal mistakes during restore:
people will hand the wrong xlog files to restore and the software will
be unable to recognize the inconsistency.

We really need to get this right the first time.

            regards, tom lane

Re: PITR Archive Recovery plus WIP PITR

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Simon Riggs wrote:
> > what's not (yet)
> >> - Timelines...though I think they are useful, they may not be critical
>
> > I am not fond of the timeline idea, especially for 7.5.  Let's get usage
> > cases submitted first.  I can imagine timelines as causing significant
> > confusion during restore, which is the last thing we want to do.
>
> I think that judgment is exactly backward.  *Not* having timelines is
> what will cause serious and possibly fatal mistakes during restore:
> people will hand the wrong xlog files to restore and the software will
> be unable to recognize the inconsistency.
>
> We really need to get this right the first time.

I assume they could just restore from backup and try again.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PITR Archive Recovery plus WIP PITR

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I think that judgment is exactly backward.  *Not* having timelines is
>> what will cause serious and possibly fatal mistakes during restore:
>> people will hand the wrong xlog files to restore and the software will
>> be unable to recognize the inconsistency.

> I assume they could just restore from backup and try again.

Sure, if they don't mind losing whatever transactions they processed
before realizing how broken their database was.  That's not going to be
an acceptable answer for the sort of installations that need PITR in the
first place.

I think it's really important to get this right the first time, both for
reliability's sake and because we are expecting people to write their
own archiving scripts.  If we change the xlog segment naming convention
later on, then we will break all those scripts.

            regards, tom lane

Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
On Wed, 2004-07-14 at 05:45, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think that judgment is exactly backward.  *Not* having timelines is
> >> what will cause serious and possibly fatal mistakes during restore:
> >> people will hand the wrong xlog files to restore and the software will
> >> be unable to recognize the inconsistency.
>
> > I assume they could just restore from backup and try again.
>
> Sure, if they don't mind losing whatever transactions they processed
> before realizing how broken their database was.  That's not going to be
> an acceptable answer for the sort of installations that need PITR in the
> first place.
>
> I think it's really important to get this right the first time, both for
> reliability's sake and because we are expecting people to write their
> own archiving scripts.  If we change the xlog segment naming convention
> later on, then we will break all those scripts.
>

I agree, but I'm going to have a rest day while people test what is
already there in case there are further code changes....which nods
towards both of your concerns.

BTW, one test last night broke because of the lack of timelines...

Best Regards, Simon Riggs


Re: PITR Archive Recovery plus WIP PITR

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think that judgment is exactly backward.  *Not* having timelines is
> >> what will cause serious and possibly fatal mistakes during restore:
> >> people will hand the wrong xlog files to restore and the software will
> >> be unable to recognize the inconsistency.
>
> > I assume they could just restore from backup and try again.
>
> Sure, if they don't mind losing whatever transactions they processed
> before realizing how broken their database was.  That's not going to be
> an acceptable answer for the sort of installations that need PITR in the
> first place.
>
> I think it's really important to get this right the first time, both for
> reliability's sake and because we are expecting people to write their
> own archiving scripts.  If we change the xlog segment naming convention
> later on, then we will break all those scripts.

We don't have anything hardcoded based on those file names, at last in
PostgreSQL.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PITR Archive Recovery plus WIP PITR

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I think it's really important to get this right the first time, both for
>> reliability's sake and because we are expecting people to write their
>> own archiving scripts.  If we change the xlog segment naming convention
>> later on, then we will break all those scripts.

> We don't have anything hardcoded based on those file names, at last in
> PostgreSQL.

That's because we've punted the whole problem of archive-segment
management off to the users.

If we did implement this functionality ourselves then I'd be less
worried, since we'd know that future changes would affect only our
own code.  But as things stand, we will have very unhappy PITR users
if we change the naming convention later.

            regards, tom lane

Re: PITR Archive Recovery plus WIP PITR

From
Simon Riggs
Date:
On Wed, 2004-07-14 at 16:00, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think it's really important to get this right the first time, both for
> >> reliability's sake and because we are expecting people to write their
> >> own archiving scripts.  If we change the xlog segment naming convention
> >> later on, then we will break all those scripts.
>
> > We don't have anything hardcoded based on those file names, at last in
> > PostgreSQL.
>

Well, I think we do. There's two places where the filename format and
length matters and there are numerous calls to calculate filenames from
log,seg pairs. ...and much of the current patch would need reworking
thoroughly to make sure all differences were changed.

Which is why I was striving for a solution that retained the filename
make-up, by adding a very large number to the log value (we just aren't
gonna run out...I did the math in another post).

> That's because we've punted the whole problem of archive-segment
> management off to the users.
>
> If we did implement this functionality ourselves then I'd be less
> worried, since we'd know that future changes would affect only our
> own code.  But as things stand, we will have very unhappy PITR users
> if we change the naming convention later.
>

Yes, if we are going to change the xlog filename format, we must do it
now. The change must be in effect whether or not you use archive_mode.

...Is there agreement with my previous posts on this....marked "Point in
Time Recovery" over the last few days?
Is that what we should implement?

Overall, the timeline concept is only worth implementing if:
- we archive xlogs
- we recover them
- we recover them to a point in time/txnid

We agreed that the last part wasn't the priority for beta freeze. I'm
willing to spend more time on the timeline idea as long as I've got some
idea that we will be committing what has been developed so far. It takes
effort to keep the patch viable against changes because new commits
update the catalog version, which invalidates all my test databases, as
well as any changes I have to track down. ...and I've been doing that
for a month now - getting much better though, thanks.

If we can review what we have now, I would be most pleased. Until we
commit at least some of it, I'm the only developer and I would like to
open this up to allow others to contribute more easily.

Best Regards, Simon Riggs



Re: PITR Archive Recovery plus WIP PITR

From
Christopher Kings-Lynne
Date:
>>I am not fond of the timeline idea, especially for 7.5.  Let's get usage
>>cases submitted first.  I can imagine timelines as causing significant
>>confusion during restore, which is the last thing we want to do.
>
> I think that judgment is exactly backward.  *Not* having timelines is
> what will cause serious and possibly fatal mistakes during restore:
> people will hand the wrong xlog files to restore and the software will
> be unable to recognize the inconsistency.
>
> We really need to get this right the first time.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: PITR Archive Recovery plus WIP PITR

From
Christopher Kings-Lynne
Date:
Please ignore- seems some old mail of mine got sent waaay late...

Christopher Kings-Lynne wrote:
>>> I am not fond of the timeline idea, especially for 7.5.  Let's get usage
>>> cases submitted first.  I can imagine timelines as causing significant
>>> confusion during restore, which is the last thing we want to do.
>>
>>
>> I think that judgment is exactly backward.  *Not* having timelines is
>> what will cause serious and possibly fatal mistakes during restore:
>> people will hand the wrong xlog files to restore and the software will
>> be unable to recognize the inconsistency.
>>
>> We really need to get this right the first time.
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)