Thread: PITR Recovery

PITR Recovery

From

Simon Riggs

Date:

15 June 2004, 17:52:15

...on the assumption we now have archived xlogs, how do we do recovery?

Default is to put all xlogs back into pg_xlog and then let recovery do
its work...though clearly we all want finer specification than that.

Based upon all our discussions to date...I propose to:
1. put more verbose instrumentation around recovery, so we can see how
recovery progresses and calculate an estimated recovery time
2. put in a recovery control command
3. put in a validation step that will check to see whether there are any
missing transaction log files in the sequence of xlogs available

The recovery control command would:
- read the file DataDir/recovery.conf (placed there by DBA)
- parse out an SQL-like command string
ROLLWARD object target finalaction;

e.g. 
ROLLFORWARD DATABASE TO END OF LOGS;
(is the current, and would be default, behaviour if file absent)

ROLLFORWARD DATABASE TO TIMESTAMP '2004-06-11-23:58:02.123' EXCLUSIVE;

ROLLFORWARD DATABASE TO END OF LOGS THEN PAUSE

SYNTAX
object = DATABASE (default) | TABLESPACE
target = END OF LOGS (default)| TO TIMESTAMP 'yyyy-mm-dd-hh:mm:ss.sss' edge
edge = INCLUSIVE (default) | EXCLUSIVE
finalaction = THEN START (default)| THEN PAUSE

-object refers to the part of the database (or whole) that is to be
recovered
-target specifies whether to stop, and what test we will use
-edge refers to whether we use <= or < on the test for target
-finalaction refers to what to do when target is reached - the purpose
of this is to allow recovery of a database to occur when we don't have
enough space for all of the xlogs at once, so we need to do recovery in
batches.

When recovery is complete, recovery.conf would be renamed to
recovery.done, so it would not be reactivated if we restart.

In time for beta freeze, I think it is possible to do a limited subset
of the above:
- implement DATABASE only (whole instance, not specific database)
- implement END OF LOGS and TO TIMESTAMP
- implement THEN START only
- implement using simple C, rather than bison

Reading the command is probably the hardest part of this, so agreeing
what we're working towards is crucial. We're out of time to redesign
this once its coded.

If the hooks are there, we can always code up more should it be required
for a particular recovery...

The syntax is very like DB2, but is designed to be reminiscent of other
systems that give you control over rollforward recovery (e.g. Oracle
etc), allowing those with experience to migrate easily to PostgreSQL.

Implementation wise, I would expect all of this code to go in xlog.c,
with the recovery target code living in ReadRecord(). This would delve
inside each record to check record type, then if it is a COMMIT record
to look further at the timestamp then either implement this COMMIT or
not according to INCLUSIVE/EXCLUSIVE.
Only the txn boundary records have time stamps... 

As Tom points out, we can't accept normal SQL at this point, nor can we
easily achieve this with command line switches or postgresql.conf
entries. My solution is to just use another .conf file (call it what you
like...)

Comments?

Best Regards, Simon Riggs

Re: PITR Recovery

From

Tom Lane

Date:

15 June 2004, 22:50:22

Simon Riggs <simon@2ndquadrant.com> writes:
> -finalaction refers to what to do when target is reached - the purpose
> of this is to allow recovery of a database to occur when we don't have
> enough space for all of the xlogs at once, so we need to do recovery in
> batches.

It seems to me that this is the only *essential* feature out of what
you've listed, and the others are okay to add later.  So I question
your priorities:

> In time for beta freeze, I think it is possible to do a limited subset
> of the above:
> - implement DATABASE only (whole instance, not specific database)
> - implement END OF LOGS and TO TIMESTAMP
> - implement THEN START only
> - implement using simple C, rather than bison

which seem to include everything except the one absolute must-have
for any serious installation.

(BTW, I doubt that single-database recovery is possible at all, ever.
You can't go hacking the clog and shared tables and not keep all the
databases in sync.  So I'd forget the "object" concept altogether.)

> Implementation wise, I would expect all of this code to go in xlog.c,
> with the recovery target code living in ReadRecord().

I'd like to keep it out of there, as xlog.c is far too big and complex
already.  Not sure where else though.  Maybe we need to break down
xlog.c somehow.
        regards, tom lane

Re: PITR Recovery

From

Simon Riggs

Date:

16 June 2004, 19:02:37

On Wed, 2004-06-16 at 02:49, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:

> > Implementation wise, I would expect all of this code to go in xlog.c,
> > with the recovery target code living in ReadRecord().
> 
> I'd like to keep it out of there, as xlog.c is far too big and complex
> already.  Not sure where else though.  Maybe we need to break down
> xlog.c somehow.
> 

Yes, I would very much like to split out the recovery code into a
different file, so that all recovery code was all in one place.

Refactoring in this way would protect further PITR work from conflicting
with other changes in the last minute rush, as well as making most
future recovery changes a single file patch (well...nearly...)

xlogutils.c is already almost fully dedicated to recovery code, so it
seems like a good place to centralise, even though I don't like the
name!

Looking at the code, I would suggest:

--Move these code sections into void X (void) functions that would
reside in xlogutils.c but get called from StartupXLog in xlog.c,
currently within if (InRecovery) {} braces:
Add StartupRecovery() - main REDO recovery
from 
---/* REDO */if (InRecovery){
--- to                (errmsg("redo is not required")));}
/* * Init xlog buffer cache using the block containing the last valid
---
Add CleanupRecovery() - cleanup after recovery
..similarly

that would then allow us to 
--Move the following to xlogutils.c  XLogInitRelationCache  RestoreBkpBlocks  ReadRecord (may need still to be called
fromxlog.c)     RecordIsValid     ValidXLOGHeader  XLogCloseRelationCache
 

--Remove from xlogutils.h
extern void XLogInitRelationCache(void);
extern void XLogCloseRelationCache(void);

replace with 

extern void StartupRecovery(void);
extern void CleanupRecovery(void);

Is that something you'd be able to do as a starting point for the other
changes? It's easier for a committer to do this, than for me to do it
and then another to review it...

Best Regards, Simon Riggs

Re: PITR Recovery

From

Tom Lane

Date:

16 June 2004, 19:50:55

Simon Riggs <simon@2ndquadrant.com> writes:
> Is that something you'd be able to do as a starting point for the other
> changes? It's easier for a committer to do this, than for me to do it
> and then another to review it...

I'm up to my eyeballs in tablespaces right now, but if you can wait a
couple days for this ...
        regards, tom lane

Re: PITR Recovery

From

Simon Riggs

Date:

17 June 2004, 18:07:46

On Wed, 2004-06-16 at 23:50, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Is that something you'd be able to do as a starting point for the other
> > changes? It's easier for a committer to do this, than for me to do it
> > and then another to review it...
> 
> I'm up to my eyeballs in tablespaces right now, but if you can wait a
> couple days for this ...

Whatever minimises your time...seriously

Say the word and I'll do it.

Simon

Re: PITR Recovery

From

Simon Riggs

Date:

17 June 2004, 18:47:55

On Wed, 2004-06-16 at 02:49, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > -finalaction refers to what to do when target is reached - the purpose
> > of this is to allow recovery of a database to occur when we don't have
> > enough space for all of the xlogs at once, so we need to do recovery in
> > batches.
> 
> It seems to me that this is the only *essential* feature out of what
> you've listed, and the others are okay to add later.  So I question
> your priorities:
> 
> > In time for beta freeze, I think it is possible to do a limited subset
> > of the above:
> > - implement DATABASE only (whole instance, not specific database)
> > - implement END OF LOGS and TO TIMESTAMP
> > - implement THEN START only
> > - implement using simple C, rather than bison
> 
> which seem to include everything except the one absolute must-have
> for any serious installation.
> 

OK. At first, I disagreed, for many reasons.

I discussion with Bruce, I believe a fairly neat streaming solution is
possible.

During recovery, as each request for a new xlog is made, we can make a
system(3) call to a user defined recovery_program to retrieve the next
xlog and out it in place. As each xlog is closed the file will be
removed. The result of this would be to stream the data files through
recovery, so no more than 1-2 files would ever be required to perform
what could be (and is touted as this by other vendors) an infinite
recovery.

The result is that a backup tape (or other tape silo) could stream data
straight through to recovery, and would completely circumvent and
concern about insufficient disk space for recovery.

This would involve changes to XLogFileOpen() in xlog.c and far less
complex than I had imagined such functionality could be.

This could be specified to PostgreSQL by using:
- restore_program='cp %s %s' or similar

I'll work more on the design, but not tonight.

Best Regards, Simon Riggs

Re: PITR Recovery

From

Simon Riggs

Date:

23 June 2004, 05:46:49

On Thu, 2004-06-17 at 22:47, Simon Riggs wrote:
> On Wed, 2004-06-16 at 02:49, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > -finalaction refers to what to do when target is reached - the purpose
> > > of this is to allow recovery of a database to occur when we don't have
> > > enough space for all of the xlogs at once, so we need to do recovery in
> > > batches.
> > 
> > It seems to me that this is the only *essential* feature out of what
> > you've listed, and the others are okay to add later.  So I question
> > your priorities:
> > 
> > > In time for beta freeze, I think it is possible to do a limited subset
> > > of the above:
> > > - implement DATABASE only (whole instance, not specific database)
> > > - implement END OF LOGS and TO TIMESTAMP
> > > - implement THEN START only
> > > - implement using simple C, rather than bison
> > 
> > which seem to include everything except the one absolute must-have
> > for any serious installation.
> > 
> 
> OK. At first, I disagreed, for many reasons.
> 
> I discussion with Bruce, I believe a fairly neat streaming solution is
> possible.
> 
> During recovery, as each request for a new xlog is made, we can make a
> system(3) call to a user defined recovery_program to retrieve the next
> xlog and out it in place. As each xlog is closed the file will be
> removed. The result of this would be to stream the data files through
> recovery, so no more than 1-2 files would ever be required to perform
> what could be (and is touted as this by other vendors) an infinite
> recovery.
> 
> The result is that a backup tape (or other tape silo) could stream data
> straight through to recovery, and would completely circumvent and
> concern about insufficient disk space for recovery.
> 
> This would involve changes to XLogFileOpen() in xlog.c and far less
> complex than I had imagined such functionality could be.
> 
> This could be specified to PostgreSQL by using:
> - restore_program='cp %s %s' or similar
> 
> I'll work more on the design, but not tonight.
> 

Technically straightforward, though more complex I thought, but
streaming the xlog files during recovery works in prototype - great idea
Bruce and thanks for pushing for a solution in that area, Tom.
[It looks like we do need to have a separate command file dedicated to
recovery options, otherwise there's no way to tell difference between
crash and full media recovery - but I'll lose the pompous syntax.]

I'll include this (actually very few new/changed lines) and the xlog
refactoring (lots of moved lines, but few changes) in a single patch.

These changes are dependent upon, but otherwise independent of the PITR
Archival path submitted on 15th. If anybody has comments on that patch,
please pass them through ASAP, otherwise I may be building on sand.

My plan is to get this out ASAP (tonight, hopefully), then build on it
with a few extra tweaks, so we have a full set of options for PITR by
29th.

Thanks,

Best Regards, Simon Riggs