Thread: PITR Archival
I enclose a working set of context diff patches and new files to make PITR archiving work, as of cvstip (NOW). You'll see the new options in the postgresql.conf...though you may wish to use archive_debug = true as well, when testing. There is one bug: shutdown doesn't work quite right. I haven't fixed this because I've spent too long trying to decipher how pgstat did a clean shutdown, discovering now that it didn't and that has now been patched...something similar is required for pgarch, but I'm out of time now...leaving time for discussion of this lot... I'm looking to have this lot committed asap, cos my fingers are starting to catch the bitrot now. :) I have a considerable amount still to learn about CVS, diff and patch, so anybody wanting to spend 10-15 mins on the phone with me would greatly enhance my chances of helping patch my patch, when the bugs roll in. If we get this done smoothly, I reckon I can have some PITR recovery control done by beta freeze. Best regards, Simon Riggs
Attachment
On Tue, 2004-06-15 at 16:34, Simon Riggs wrote: > I enclose a working set of context diff patches and new files to make > PITR archiving work, as of cvstip (NOW). > > You'll see the new options in the postgresql.conf...though you may wish > to use archive_debug = true as well, when testing. > > There is one bug: shutdown doesn't work quite right. I haven't fixed > this because I've spent too long trying to decipher how pgstat did a > clean shutdown, discovering now that it didn't and that has now been > patched...something similar is required for pgarch, but I'm out of time > now...leaving time for discussion of this lot... > The patch creates a further child process of postmaster, the archiver. Archiver code is similar, but not the same, as pgstat.c, and by analogy lives in src/backend/postmaster/pgarch.c with a matching pgarch.h in src/include (just as with pgstat.h) At various points, 5 processes are involved, all of which are identified clearly in the debugging messages... - backend: sends signal to postmaster to say xlog is ready to archive - postmaster: catches and resends to archiver - archiver: then calls system(3) to invoke a user-defined archival task - bgwriter: later cleans up archive_status at checkpoint time Please note: you will need to initdb to make this work Also note: archive_mode is not designed to be turned on/off frequently. It is possible to confuse it if you turn it on, then restart with it off, then turn it on again. That is likely to create a "hole" in the archive history of xlogs and you will not be able to recover correctly. You may also note that the design has changed substantially from many earlier design postings...all of this is based on community input to rationalise behaviour and to streamline code. If you wish to test recovery, you should: - do a full physical backup of DataDir, while postmaster is UP - archive all xlogs when disaster strikes - restore backup of DataDir - restore all archived xlogs to pg_xlog - startup postmaster (and watch...) Further work is still required to make it STOP recovering at a predetermined point in time.... Best regards, Simon Riggs
On Tue, Jun 15, 2004 at 04:34:30PM +0100, Simon Riggs wrote: > I have a considerable amount still to learn about CVS, diff and patch, > so anybody wanting to spend 10-15 mins on the phone with me would > greatly enhance my chances of helping patch my patch, when the bugs roll > in. Can't help you with the phone thingie, but if you want to see what's in a patch and be able to edit it nicely, I suggest you use meld. It's a python dual-pane GTK display, really nice. Too slow for real coding (on my machine that is) but real good for seeing what you changed. Watch out for use of Ctrl, Shift and Alt -- very handy. Can use CVS as well, or you can use two source trees. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
Simon Riggs wrote: > Also note: archive_mode is not designed to be turned on/off frequently. > It is possible to confuse it if you turn it on, then restart with it > off, then turn it on again. That is likely to create a "hole" in the > archive history of xlogs and you will not be able to recover correctly. I assume full xlog files will still be transfered, but that you will be missing some files while the archiver was turned off. That issue will be part of the restore procedure, I assume. I think we need to specify people should use an empty archive directory every time they turn on archiving. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it once it is reviewed. --------------------------------------------------------------------------- Simon Riggs wrote: > I enclose a working set of context diff patches and new files to make > PITR archiving work, as of cvstip (NOW). > > You'll see the new options in the postgresql.conf...though you may wish > to use archive_debug = true as well, when testing. > > There is one bug: shutdown doesn't work quite right. I haven't fixed > this because I've spent too long trying to decipher how pgstat did a > clean shutdown, discovering now that it didn't and that has now been > patched...something similar is required for pgarch, but I'm out of time > now...leaving time for discussion of this lot... > > I'm looking to have this lot committed asap, cos my fingers are starting > to catch the bitrot now. :) > > I have a considerable amount still to learn about CVS, diff and patch, > so anybody wanting to spend 10-15 mins on the phone with me would > greatly enhance my chances of helping patch my patch, when the bugs roll > in. > > If we get this done smoothly, I reckon I can have some PITR recovery > control done by beta freeze. > > Best regards, Simon Riggs [ Attachment, skipping... ] [ Attachment, skipping... ] [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2004-06-15 at 16:34, Simon Riggs wrote: > I enclose a working set of context diff patches and new files to make > PITR archiving work, as of cvstip (NOW). As of now, this patch is invalidated by recent changes. Don't try to run it, it doesn't even start. There's good news coming later in this email update.... :) I'm now in the middle of reworking this, 1-3 days, I guess. > There is one bug: shutdown doesn't work quite right. I haven't fixed > this because I've spent too long trying to decipher how pgstat did a > clean shutdown, discovering now that it didn't and that has now been > patched...something similar is required for pgarch, but I'm out of time > now...leaving time for discussion of this lot... > The code that doesn't work appears to be the Archiver startup code, so that effectively requires me to rework the startup/shutdown code that was slightly flaky, as mentioned above. I originally cloned the pgstat code, which as I observed was possible to improve upon....that having been done (a good thing) should just leave the task as a re-clone (especially since that par of the code I can almost recite by now). I would welcome one of the Win32 crew having an eye through the code, to suggest any areas to avoid/pay close attention to. I don't want to invalidate any other work, seeing as I'm last to show now at this party. > If we get this done smoothly, I reckon I can have some PITR recovery > control done by beta freeze. The good news: ...before recent changes, I had a working version of the streaming recovery code and recovered 3 test databases. I would note that the recovery speed was quite impressive, even on my 700Mhz dev pc, so I have confidence that this will work well on the real deal. How this works: when startup enters recovery, if you have supplied a file: DataDir/recovery.conf, then it will read this to get a recovery command line. startup executes this command each time recovery requests a new xlog from archive, using only one file at a time. Result: you will be able to recover, no matter how many xlogs you have and how little disk space you have (in comparison). This will allow you to write very short scripts to perform - "infinite recovery", just like Oracle 9i and DB2 8.1 - integration with tapes and backup software - automated standby databases, where an active database feeds logs to a passive database that is permanently "InRecovery", yet can be brought on-line within a few seconds if the link drops. I haven't yet written the code to stop at-a-point-in-time, but I remain ever optimistic. I won't slow down the patch waiting for that... ...and I take it the doc freeze is not the same time as the code freeze.. Best regards, Simon Riggs
Going backwards at a rate of knots, I can't even get postmaster to compile now.... and I'm talking about the one from cvstip, NOT with my changes...please assist a tired brain My errors: [sriggs@stromboli postmaster]$ make gcc -O2 -fno-strict-aliasing -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../src/include -D_GNU_SOURCE -c -o postmaster.o postmaster.c postmaster.c: In function `ServerLoop': postmaster.c:1046: error: storage size of `tz' isn't known postmaster.c:1048: warning: implicit declaration of function `gettimeofday' postmaster.c:1046: warning: unused variable `tz' postmaster.c: In function `BackendRun': postmaster.c:2388: error: storage size of `tz' isn't known postmaster.c:2388: warning: unused variable `tz' make: *** [postmaster.o] Error 1 [sriggs@stromboli postmaster]$ These are the only errors from make... port.h postmaster.c have both just been updated by cvs update ...for the last few days, I've been getting this when I attempt initdb with -d option enabled LOG: could not open directory "/share/timezone": No such file or directory DEBUG: Reject TZ "GMT0BST": at 1111968000 2005-03-28 00:00:00 std versus 2005-03-28 01:00:00 dst DEBUG: Reject TZ "GMT0": at 1088121600 2004-06-25 00:00:00 std versus 2004-06-25 01:00:00 dst LOG: could not recognize system timezone, defaulting to "Etc/GMT0" HINT: You can specify the correct timezone in postgresql.conf. ...my system shows.... [sriggs@stromboli pgsql]$ date Fri Jun 25 20:55:31 BST 2004 ...is this a case of? #ifdef BRIT gcc -throw-wobbly #endif Comments? Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Going backwards at a rate of knots, I can't even get postmaster to > compile now.... and I'm talking about the one from cvstip, NOT with my > postmaster.c:1046: error: storage size of `tz' isn't known > postmaster.c:1048: warning: implicit declaration of function > `gettimeofday' You probably need to rerun configure. It sounds like <sys/time.h> isn't getting imported because HAVE_SYS_TIME_H isn't defined ... > ...for the last few days, I've been getting this when I attempt initdb > with -d option enabled > LOG: could not open directory "/share/timezone": No such file or > directory I am suspicious that this is a configuration problem too. "make distclean" and a full rebuild seems indicated. regards, tom lane
On Fri, 2004-06-25 at 21:36, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Going backwards at a rate of knots, I can't even get postmaster to > > compile now.... and I'm talking about the one from cvstip, NOT with my > > > postmaster.c:1046: error: storage size of `tz' isn't known > > postmaster.c:1048: warning: implicit declaration of function > > `gettimeofday' > > You probably need to rerun configure. It sounds like <sys/time.h> > isn't getting imported because HAVE_SYS_TIME_H isn't defined ... > > > ...for the last few days, I've been getting this when I attempt initdb > > with -d option enabled > > > LOG: could not open directory "/share/timezone": No such file or > > directory > > I am suspicious that this is a configuration problem too. "make > distclean" and a full rebuild seems indicated. > Hmmm...I already tried running configure again Advice given...advice heeded: configure make distclean make fails...same way...damn. I'll try a full clean cvstip checkout and try again... I've not gone anywhere near timezones...so I have bad vibes. Regards, Simon Riggs