Thread: PITR Archive Recovery plus WIP PITR
A number of you have pointed out that the last patch had a number of problems, thank you all. This patchset supercedes all previous versions, and is the only one that will work against current CVS tip. cd pgsql/src patch -p0 < pitr_v5_0.patch then place pgarch.c in src/backend/postmaster pgarch.h in src/include Read the README - its long and you wont have a clue without it ...remember, you HAVE TO run this on a cluster created by initdb within the last 2-3 days... When you perform a recovery, you can use the example recovery.conf provided here. (Not all of the options work yet...) Just place this file in PGDATA then crank up the postmaster any way you choose. Known issues: - CREATE DATABASE will not be recovered (at present)...create a new database BEFORE you take a full physical backup - error handling in the archiver has some issues, and requires a few improvements... - PITR support is partially complete in this patch - its a Work in Progress (WIP), but doesn't interfere with other operations - I will continue to work on this ...but this works - so please don't be put off from giving it a try - it will take a while to get used to the concepts behind it all. Earlier versions - much of the code for recovery is in xlog.c - refactoring proved difficult and a big timewaster when the main functions aren't all there yet - error messages streamlined - variable naming more consistent than earlier versions Best Regards, Simon Riggs
Attachment
On Fri, 2004-07-09 at 12:53, Klaus Naumann wrote: > archive_program is provided with a string which contains the target directory. > That doesn't really make sense. archive_dest is used for both archive and restore, thats why its set as a separate parameter. Thats the rationale...lets see what others think > First of all it introduces the problem you > mentioned in the README file (if the directory doesn't exist you loose > xlogs). Your example quoted later is the answer.... use archive_dest = '/mnt/pgarch/' rather than archive_dest = '/mnt/pgarch' which is ambiguous... > I thought about checking if this is a dir within the code. But > this would make things too unfelxible. Yes, otherwise the check would be there > Second, we could make the user responsible of what he's doing by not > giving him any target. > Remember, the user is specifying the archive_dest also, so the user is completely responsible for how archiving actually occurs. > Like you could then do things like: > > archive_program = 'gzip -d %s | tar rf /dev/nst0 - ' archive_program = 'gzip -d %s | tar rf %s - ' would be how I would use it in the example you give > > Which adds the file to a tar archive on his tape. > If he wants to archive it on disk, let him do it this way: > > archive_program = 'cp %s /mnt/pgarch/' archive_program = 'cp %s %s' would be the way to specify that... Thank you very much for feedback and your other contributions, Best regards, Simon Riggs
Following a suggestion and patch from Klaus Naumann, the recovery.conf file can now accept comments.... No patch supplied at present (anoncvs is down), but here is the annotated recovery.conf.sample Best Regards, Simon Riggs
Attachment
New release of patch, at v5_1 ... for serious testing what's in - Point in Time Recovery now works....please check carefully - additional options in recovery.conf (including code contributed to PITR from Klaus Naumann) what's not (yet) - Timelines...though I think they are useful, they may not be critical - handling of local/UTC times (the variable is there...) The number of permutations is increasing, and available time is decreasing....not a full retest, OK. On Thu, 2004-07-08 at 19:11, Simon Riggs wrote: > cd pgsql/src > patch -p0 < pitr_v5_0.patch > > then place > > pgarch.c in src/backend/postmaster > pgarch.h in src/include > > Read the README - its long and you wont have a clue without it > > ...remember, you HAVE TO run this on a cluster created by initdb within > the last 2-3 days... > > When you perform a recovery, you can use the example recovery.conf > provided here. (Not all of the options work yet...) Just place this file > in PGDATA then crank up the postmaster any way you choose. > > Known issues: > - CREATE DATABASE will not be recovered (at present)...create a new > database BEFORE you take a full physical backup > - error handling in the archiver has some issues, and requires a few > improvements... > ...but this works - so please don't be put off from giving it a try - it > will take a while to get used to the concepts behind it all. > > Earlier versions > - much of the code for recovery is in xlog.c - refactoring proved > difficult and a big timewaster when the main functions aren't all there > yet > - error messages streamlined > - variable naming more consistent than earlier versions > > Best Regards, Simon Riggs
Attachment
Simon Riggs wrote: > New release of patch, at v5_1 ... for serious testing > what's in > - Point in Time Recovery now works....please check carefully > - additional options in recovery.conf > (including code contributed to PITR from Klaus Naumann) > > what's not (yet) > - Timelines...though I think they are useful, they may not be critical I am not fond of the timeline idea, especially for 7.5. Let's get usage cases submitted first. I can imagine timelines as causing significant confusion during restore, which is the last thing we want to do. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2004-07-13 at 23:58, Bruce Momjian wrote: > Simon Riggs wrote: > > New release of patch, at v5_1 ... for serious testing > > what's in > > - Point in Time Recovery now works....please check carefully > > - additional options in recovery.conf > > (including code contributed to PITR from Klaus Naumann) > > > > what's not (yet) > > - Timelines...though I think they are useful, they may not be critical > > I am not fond of the timeline idea, especially for 7.5. Let's get usage > cases submitted first. I can imagine timelines as causing significant > confusion during restore, which is the last thing we want to do. Well, I really want to finish this, so I do agree. Exhaustion is setting in....I need other eyes to test and fix the bugs. Best Regards, Simon Riggs
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Simon Riggs wrote: > what's not (yet) >> - Timelines...though I think they are useful, they may not be critical > I am not fond of the timeline idea, especially for 7.5. Let's get usage > cases submitted first. I can imagine timelines as causing significant > confusion during restore, which is the last thing we want to do. I think that judgment is exactly backward. *Not* having timelines is what will cause serious and possibly fatal mistakes during restore: people will hand the wrong xlog files to restore and the software will be unable to recognize the inconsistency. We really need to get this right the first time. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Simon Riggs wrote: > > what's not (yet) > >> - Timelines...though I think they are useful, they may not be critical > > > I am not fond of the timeline idea, especially for 7.5. Let's get usage > > cases submitted first. I can imagine timelines as causing significant > > confusion during restore, which is the last thing we want to do. > > I think that judgment is exactly backward. *Not* having timelines is > what will cause serious and possibly fatal mistakes during restore: > people will hand the wrong xlog files to restore and the software will > be unable to recognize the inconsistency. > > We really need to get this right the first time. I assume they could just restore from backup and try again. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I think that judgment is exactly backward. *Not* having timelines is >> what will cause serious and possibly fatal mistakes during restore: >> people will hand the wrong xlog files to restore and the software will >> be unable to recognize the inconsistency. > I assume they could just restore from backup and try again. Sure, if they don't mind losing whatever transactions they processed before realizing how broken their database was. That's not going to be an acceptable answer for the sort of installations that need PITR in the first place. I think it's really important to get this right the first time, both for reliability's sake and because we are expecting people to write their own archiving scripts. If we change the xlog segment naming convention later on, then we will break all those scripts. regards, tom lane
On Wed, 2004-07-14 at 05:45, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I think that judgment is exactly backward. *Not* having timelines is > >> what will cause serious and possibly fatal mistakes during restore: > >> people will hand the wrong xlog files to restore and the software will > >> be unable to recognize the inconsistency. > > > I assume they could just restore from backup and try again. > > Sure, if they don't mind losing whatever transactions they processed > before realizing how broken their database was. That's not going to be > an acceptable answer for the sort of installations that need PITR in the > first place. > > I think it's really important to get this right the first time, both for > reliability's sake and because we are expecting people to write their > own archiving scripts. If we change the xlog segment naming convention > later on, then we will break all those scripts. > I agree, but I'm going to have a rest day while people test what is already there in case there are further code changes....which nods towards both of your concerns. BTW, one test last night broke because of the lack of timelines... Best Regards, Simon Riggs
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I think that judgment is exactly backward. *Not* having timelines is > >> what will cause serious and possibly fatal mistakes during restore: > >> people will hand the wrong xlog files to restore and the software will > >> be unable to recognize the inconsistency. > > > I assume they could just restore from backup and try again. > > Sure, if they don't mind losing whatever transactions they processed > before realizing how broken their database was. That's not going to be > an acceptable answer for the sort of installations that need PITR in the > first place. > > I think it's really important to get this right the first time, both for > reliability's sake and because we are expecting people to write their > own archiving scripts. If we change the xlog segment naming convention > later on, then we will break all those scripts. We don't have anything hardcoded based on those file names, at last in PostgreSQL. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I think it's really important to get this right the first time, both for >> reliability's sake and because we are expecting people to write their >> own archiving scripts. If we change the xlog segment naming convention >> later on, then we will break all those scripts. > We don't have anything hardcoded based on those file names, at last in > PostgreSQL. That's because we've punted the whole problem of archive-segment management off to the users. If we did implement this functionality ourselves then I'd be less worried, since we'd know that future changes would affect only our own code. But as things stand, we will have very unhappy PITR users if we change the naming convention later. regards, tom lane
On Wed, 2004-07-14 at 16:00, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I think it's really important to get this right the first time, both for > >> reliability's sake and because we are expecting people to write their > >> own archiving scripts. If we change the xlog segment naming convention > >> later on, then we will break all those scripts. > > > We don't have anything hardcoded based on those file names, at last in > > PostgreSQL. > Well, I think we do. There's two places where the filename format and length matters and there are numerous calls to calculate filenames from log,seg pairs. ...and much of the current patch would need reworking thoroughly to make sure all differences were changed. Which is why I was striving for a solution that retained the filename make-up, by adding a very large number to the log value (we just aren't gonna run out...I did the math in another post). > That's because we've punted the whole problem of archive-segment > management off to the users. > > If we did implement this functionality ourselves then I'd be less > worried, since we'd know that future changes would affect only our > own code. But as things stand, we will have very unhappy PITR users > if we change the naming convention later. > Yes, if we are going to change the xlog filename format, we must do it now. The change must be in effect whether or not you use archive_mode. ...Is there agreement with my previous posts on this....marked "Point in Time Recovery" over the last few days? Is that what we should implement? Overall, the timeline concept is only worth implementing if: - we archive xlogs - we recover them - we recover them to a point in time/txnid We agreed that the last part wasn't the priority for beta freeze. I'm willing to spend more time on the timeline idea as long as I've got some idea that we will be committing what has been developed so far. It takes effort to keep the patch viable against changes because new commits update the catalog version, which invalidates all my test databases, as well as any changes I have to track down. ...and I've been doing that for a month now - getting much better though, thanks. If we can review what we have now, I would be most pleased. Until we commit at least some of it, I'm the only developer and I would like to open this up to allow others to contribute more easily. Best Regards, Simon Riggs
>>I am not fond of the timeline idea, especially for 7.5. Let's get usage >>cases submitted first. I can imagine timelines as causing significant >>confusion during restore, which is the last thing we want to do. > > I think that judgment is exactly backward. *Not* having timelines is > what will cause serious and possibly fatal mistakes during restore: > people will hand the wrong xlog files to restore and the software will > be unable to recognize the inconsistency. > > We really need to get this right the first time. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
Please ignore- seems some old mail of mine got sent waaay late... Christopher Kings-Lynne wrote: >>> I am not fond of the timeline idea, especially for 7.5. Let's get usage >>> cases submitted first. I can imagine timelines as causing significant >>> confusion during restore, which is the last thing we want to do. >> >> >> I think that judgment is exactly backward. *Not* having timelines is >> what will cause serious and possibly fatal mistakes during restore: >> people will hand the wrong xlog files to restore and the software will >> be unable to recognize the inconsistency. >> >> We really need to get this right the first time. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 8: explain analyze is your friend > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)