Thread: PITR Recovery
...on the assumption we now have archived xlogs, how do we do recovery? Default is to put all xlogs back into pg_xlog and then let recovery do its work...though clearly we all want finer specification than that. Based upon all our discussions to date...I propose to: 1. put more verbose instrumentation around recovery, so we can see how recovery progresses and calculate an estimated recovery time 2. put in a recovery control command 3. put in a validation step that will check to see whether there are any missing transaction log files in the sequence of xlogs available The recovery control command would: - read the file DataDir/recovery.conf (placed there by DBA) - parse out an SQL-like command string ROLLWARD object target finalaction; e.g. ROLLFORWARD DATABASE TO END OF LOGS; (is the current, and would be default, behaviour if file absent) ROLLFORWARD DATABASE TO TIMESTAMP '2004-06-11-23:58:02.123' EXCLUSIVE; ROLLFORWARD DATABASE TO END OF LOGS THEN PAUSE SYNTAX object = DATABASE (default) | TABLESPACE target = END OF LOGS (default)| TO TIMESTAMP 'yyyy-mm-dd-hh:mm:ss.sss' edge edge = INCLUSIVE (default) | EXCLUSIVE finalaction = THEN START (default)| THEN PAUSE -object refers to the part of the database (or whole) that is to be recovered -target specifies whether to stop, and what test we will use -edge refers to whether we use <= or < on the test for target -finalaction refers to what to do when target is reached - the purpose of this is to allow recovery of a database to occur when we don't have enough space for all of the xlogs at once, so we need to do recovery in batches. When recovery is complete, recovery.conf would be renamed to recovery.done, so it would not be reactivated if we restart. In time for beta freeze, I think it is possible to do a limited subset of the above: - implement DATABASE only (whole instance, not specific database) - implement END OF LOGS and TO TIMESTAMP - implement THEN START only - implement using simple C, rather than bison Reading the command is probably the hardest part of this, so agreeing what we're working towards is crucial. We're out of time to redesign this once its coded. If the hooks are there, we can always code up more should it be required for a particular recovery... The syntax is very like DB2, but is designed to be reminiscent of other systems that give you control over rollforward recovery (e.g. Oracle etc), allowing those with experience to migrate easily to PostgreSQL. Implementation wise, I would expect all of this code to go in xlog.c, with the recovery target code living in ReadRecord(). This would delve inside each record to check record type, then if it is a COMMIT record to look further at the timestamp then either implement this COMMIT or not according to INCLUSIVE/EXCLUSIVE. Only the txn boundary records have time stamps... As Tom points out, we can't accept normal SQL at this point, nor can we easily achieve this with command line switches or postgresql.conf entries. My solution is to just use another .conf file (call it what you like...) Comments? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > -finalaction refers to what to do when target is reached - the purpose > of this is to allow recovery of a database to occur when we don't have > enough space for all of the xlogs at once, so we need to do recovery in > batches. It seems to me that this is the only *essential* feature out of what you've listed, and the others are okay to add later. So I question your priorities: > In time for beta freeze, I think it is possible to do a limited subset > of the above: > - implement DATABASE only (whole instance, not specific database) > - implement END OF LOGS and TO TIMESTAMP > - implement THEN START only > - implement using simple C, rather than bison which seem to include everything except the one absolute must-have for any serious installation. (BTW, I doubt that single-database recovery is possible at all, ever. You can't go hacking the clog and shared tables and not keep all the databases in sync. So I'd forget the "object" concept altogether.) > Implementation wise, I would expect all of this code to go in xlog.c, > with the recovery target code living in ReadRecord(). I'd like to keep it out of there, as xlog.c is far too big and complex already. Not sure where else though. Maybe we need to break down xlog.c somehow. regards, tom lane
On Wed, 2004-06-16 at 02:49, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Implementation wise, I would expect all of this code to go in xlog.c, > > with the recovery target code living in ReadRecord(). > > I'd like to keep it out of there, as xlog.c is far too big and complex > already. Not sure where else though. Maybe we need to break down > xlog.c somehow. > Yes, I would very much like to split out the recovery code into a different file, so that all recovery code was all in one place. Refactoring in this way would protect further PITR work from conflicting with other changes in the last minute rush, as well as making most future recovery changes a single file patch (well...nearly...) xlogutils.c is already almost fully dedicated to recovery code, so it seems like a good place to centralise, even though I don't like the name! Looking at the code, I would suggest: --Move these code sections into void X (void) functions that would reside in xlogutils.c but get called from StartupXLog in xlog.c, currently within if (InRecovery) {} braces: Add StartupRecovery() - main REDO recovery from ---/* REDO */if (InRecovery){ --- to (errmsg("redo is not required")));} /* * Init xlog buffer cache using the block containing the last valid --- Add CleanupRecovery() - cleanup after recovery ..similarly that would then allow us to --Move the following to xlogutils.c XLogInitRelationCache RestoreBkpBlocks ReadRecord (may need still to be called fromxlog.c) RecordIsValid ValidXLOGHeader XLogCloseRelationCache --Remove from xlogutils.h extern void XLogInitRelationCache(void); extern void XLogCloseRelationCache(void); replace with extern void StartupRecovery(void); extern void CleanupRecovery(void); Is that something you'd be able to do as a starting point for the other changes? It's easier for a committer to do this, than for me to do it and then another to review it... Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Is that something you'd be able to do as a starting point for the other > changes? It's easier for a committer to do this, than for me to do it > and then another to review it... I'm up to my eyeballs in tablespaces right now, but if you can wait a couple days for this ... regards, tom lane
On Wed, 2004-06-16 at 23:50, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Is that something you'd be able to do as a starting point for the other > > changes? It's easier for a committer to do this, than for me to do it > > and then another to review it... > > I'm up to my eyeballs in tablespaces right now, but if you can wait a > couple days for this ... Whatever minimises your time...seriously Say the word and I'll do it. Simon
On Wed, 2004-06-16 at 02:49, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > -finalaction refers to what to do when target is reached - the purpose > > of this is to allow recovery of a database to occur when we don't have > > enough space for all of the xlogs at once, so we need to do recovery in > > batches. > > It seems to me that this is the only *essential* feature out of what > you've listed, and the others are okay to add later. So I question > your priorities: > > > In time for beta freeze, I think it is possible to do a limited subset > > of the above: > > - implement DATABASE only (whole instance, not specific database) > > - implement END OF LOGS and TO TIMESTAMP > > - implement THEN START only > > - implement using simple C, rather than bison > > which seem to include everything except the one absolute must-have > for any serious installation. > OK. At first, I disagreed, for many reasons. I discussion with Bruce, I believe a fairly neat streaming solution is possible. During recovery, as each request for a new xlog is made, we can make a system(3) call to a user defined recovery_program to retrieve the next xlog and out it in place. As each xlog is closed the file will be removed. The result of this would be to stream the data files through recovery, so no more than 1-2 files would ever be required to perform what could be (and is touted as this by other vendors) an infinite recovery. The result is that a backup tape (or other tape silo) could stream data straight through to recovery, and would completely circumvent and concern about insufficient disk space for recovery. This would involve changes to XLogFileOpen() in xlog.c and far less complex than I had imagined such functionality could be. This could be specified to PostgreSQL by using: - restore_program='cp %s %s' or similar I'll work more on the design, but not tonight. Best Regards, Simon Riggs
On Thu, 2004-06-17 at 22:47, Simon Riggs wrote: > On Wed, 2004-06-16 at 02:49, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > -finalaction refers to what to do when target is reached - the purpose > > > of this is to allow recovery of a database to occur when we don't have > > > enough space for all of the xlogs at once, so we need to do recovery in > > > batches. > > > > It seems to me that this is the only *essential* feature out of what > > you've listed, and the others are okay to add later. So I question > > your priorities: > > > > > In time for beta freeze, I think it is possible to do a limited subset > > > of the above: > > > - implement DATABASE only (whole instance, not specific database) > > > - implement END OF LOGS and TO TIMESTAMP > > > - implement THEN START only > > > - implement using simple C, rather than bison > > > > which seem to include everything except the one absolute must-have > > for any serious installation. > > > > OK. At first, I disagreed, for many reasons. > > I discussion with Bruce, I believe a fairly neat streaming solution is > possible. > > During recovery, as each request for a new xlog is made, we can make a > system(3) call to a user defined recovery_program to retrieve the next > xlog and out it in place. As each xlog is closed the file will be > removed. The result of this would be to stream the data files through > recovery, so no more than 1-2 files would ever be required to perform > what could be (and is touted as this by other vendors) an infinite > recovery. > > The result is that a backup tape (or other tape silo) could stream data > straight through to recovery, and would completely circumvent and > concern about insufficient disk space for recovery. > > This would involve changes to XLogFileOpen() in xlog.c and far less > complex than I had imagined such functionality could be. > > This could be specified to PostgreSQL by using: > - restore_program='cp %s %s' or similar > > I'll work more on the design, but not tonight. > Technically straightforward, though more complex I thought, but streaming the xlog files during recovery works in prototype - great idea Bruce and thanks for pushing for a solution in that area, Tom. [It looks like we do need to have a separate command file dedicated to recovery options, otherwise there's no way to tell difference between crash and full media recovery - but I'll lose the pompous syntax.] I'll include this (actually very few new/changed lines) and the xlog refactoring (lots of moved lines, but few changes) in a single patch. These changes are dependent upon, but otherwise independent of the PITR Archival path submitted on 15th. If anybody has comments on that patch, please pass them through ASAP, otherwise I may be building on sand. My plan is to get this out ASAP (tonight, hopefully), then build on it with a few extra tweaks, so we have a full set of options for PITR by 29th. Thanks, Best Regards, Simon Riggs