Thread: PITR Archival

PITR Archival

From
Simon Riggs
Date:
I enclose a working set of context diff patches and new files to make
PITR archiving work, as of cvstip (NOW).

You'll see the new options in the postgresql.conf...though you may wish
to use archive_debug = true as well, when testing.

There is one bug: shutdown doesn't work quite right. I haven't fixed
this because I've spent too long trying to decipher how pgstat did a
clean shutdown, discovering now that it didn't and that has now been
patched...something similar is required for pgarch, but I'm out of time
now...leaving time for discussion of this lot...

I'm looking to have this lot committed asap, cos my fingers are starting
to catch the bitrot now. :)

I have a considerable amount still to learn about CVS, diff and patch,
so anybody wanting to spend 10-15 mins on the phone with me would
greatly enhance my chances of helping patch my patch, when the bugs roll
in.

If we get this done smoothly, I reckon I can have some PITR recovery
control done by beta freeze.

Best regards, Simon Riggs

Attachment

Re: PITR Archival

From
Simon Riggs
Date:
On Tue, 2004-06-15 at 16:34, Simon Riggs wrote:
> I enclose a working set of context diff patches and new files to make
> PITR archiving work, as of cvstip (NOW).
>
> You'll see the new options in the postgresql.conf...though you may wish
> to use archive_debug = true as well, when testing.
>
> There is one bug: shutdown doesn't work quite right. I haven't fixed
> this because I've spent too long trying to decipher how pgstat did a
> clean shutdown, discovering now that it didn't and that has now been
> patched...something similar is required for pgarch, but I'm out of time
> now...leaving time for discussion of this lot...
>

The patch creates a further child process of postmaster, the archiver.

Archiver code is similar, but not the same, as pgstat.c, and by analogy
lives in src/backend/postmaster/pgarch.c with a matching pgarch.h in
src/include (just as with pgstat.h)

At various points, 5 processes are involved, all of which are identified
clearly in the debugging messages...
- backend: sends signal to postmaster to say xlog is ready to archive
- postmaster: catches and resends to archiver
- archiver: then calls system(3) to invoke a user-defined archival task
- bgwriter: later cleans up archive_status at checkpoint time

Please note: you will need to initdb to make this work

Also note: archive_mode is not designed to be turned on/off frequently.
It is possible to confuse it if you turn it on, then restart with it
off, then turn it on again. That is likely to create a "hole" in the
archive history of xlogs and you will not be able to recover correctly.

You may also note that the design has changed substantially from many
earlier design postings...all of this is based on community input to
rationalise behaviour and to streamline code.

If you wish to test recovery, you should:
- do a full physical backup of DataDir, while postmaster is UP
- archive all xlogs

when disaster strikes
- restore backup of DataDir
- restore all archived xlogs to pg_xlog
- startup postmaster (and watch...)

Further work is still required to make it STOP recovering at a
predetermined point in time....

Best regards, Simon Riggs


Re: PITR Archival

From
Alvaro Herrera
Date:
On Tue, Jun 15, 2004 at 04:34:30PM +0100, Simon Riggs wrote:

> I have a considerable amount still to learn about CVS, diff and patch,
> so anybody wanting to spend 10-15 mins on the phone with me would
> greatly enhance my chances of helping patch my patch, when the bugs roll
> in.

Can't help you with the phone thingie, but if you want to see what's in
a patch and be able to edit it nicely, I suggest you use meld.  It's a
python dual-pane GTK display, really nice.  Too slow for real coding (on
my machine that is) but real good for seeing what you changed.  Watch
out for use of Ctrl, Shift and Alt -- very handy.

Can use CVS as well, or you can use two source trees.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)


Re: PITR Archival

From
Bruce Momjian
Date:
Simon Riggs wrote:
> Also note: archive_mode is not designed to be turned on/off frequently.
> It is possible to confuse it if you turn it on, then restart with it
> off, then turn it on again. That is likely to create a "hole" in the
> archive history of xlogs and you will not be able to recover correctly.

I assume full xlog files will still be transfered, but that you will be
missing some files while the archiver was turned off.  That issue will
be part of the restore procedure, I assume.  I think we need to specify
people should use an empty archive directory every time they turn on
archiving.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PITR Archival

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it once it is reviewed.

---------------------------------------------------------------------------


Simon Riggs wrote:
> I enclose a working set of context diff patches and new files to make
> PITR archiving work, as of cvstip (NOW).
>
> You'll see the new options in the postgresql.conf...though you may wish
> to use archive_debug = true as well, when testing.
>
> There is one bug: shutdown doesn't work quite right. I haven't fixed
> this because I've spent too long trying to decipher how pgstat did a
> clean shutdown, discovering now that it didn't and that has now been
> patched...something similar is required for pgarch, but I'm out of time
> now...leaving time for discussion of this lot...
>
> I'm looking to have this lot committed asap, cos my fingers are starting
> to catch the bitrot now. :)
>
> I have a considerable amount still to learn about CVS, diff and patch,
> so anybody wanting to spend 10-15 mins on the phone with me would
> greatly enhance my chances of helping patch my patch, when the bugs roll
> in.
>
> If we get this done smoothly, I reckon I can have some PITR recovery
> control done by beta freeze.
>
> Best regards, Simon Riggs

[ Attachment, skipping... ]

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: PITR Archival

From
Simon Riggs
Date:
On Tue, 2004-06-15 at 16:34, Simon Riggs wrote:
> I enclose a working set of context diff patches and new files to make
> PITR archiving work, as of cvstip (NOW).

As of now, this patch is invalidated by recent changes. Don't try to run
it, it doesn't even start. There's good news coming later in this email
update.... :)

I'm now in the middle of reworking this, 1-3 days, I guess.

> There is one bug: shutdown doesn't work quite right. I haven't fixed
> this because I've spent too long trying to decipher how pgstat did a
> clean shutdown, discovering now that it didn't and that has now been
> patched...something similar is required for pgarch, but I'm out of time
> now...leaving time for discussion of this lot...
>

The code that doesn't work appears to be the Archiver startup code, so
that effectively requires me to rework the startup/shutdown code that
was slightly flaky, as mentioned above.

I originally cloned the pgstat code, which as I observed was possible to
improve upon....that having been done (a good thing) should just leave
the task as a re-clone (especially since that par of the code I can
almost recite by now).

I would welcome one of the Win32 crew having an eye through the code, to
suggest any areas to avoid/pay close attention to. I don't want to
invalidate any other work, seeing as I'm last to show now at this party.

> If we get this done smoothly, I reckon I can have some PITR recovery
> control done by beta freeze.

The good news: ...before recent changes, I had a working version of the
streaming recovery code and recovered 3 test databases. I would note
that the recovery speed was quite impressive, even on my 700Mhz dev pc,
so I have confidence that this will work well on the real deal.

How this works: when startup enters recovery, if you have supplied a
file: DataDir/recovery.conf, then it will read this to get a recovery
command line. startup executes this command each time recovery requests
a new xlog from archive, using only one file at a time.
Result: you will be able to recover, no matter how many xlogs you have
and how little disk space you have (in comparison).

This will allow you to write very short scripts to perform
- "infinite recovery", just like Oracle 9i and DB2 8.1
- integration with tapes and backup software
- automated standby databases, where an active database feeds logs to a
passive database that is permanently "InRecovery", yet can be brought
on-line within a few seconds if the link drops.

I haven't yet written the code to stop at-a-point-in-time, but I remain
ever optimistic. I won't slow down the patch waiting for that...

...and I take it the doc freeze is not the same time as the code
freeze..

Best regards, Simon Riggs


tz error prevents postmaster.c compiling...

From
Simon Riggs
Date:
Going backwards at a rate of knots, I can't even get postmaster to
compile now.... and I'm talking about the one from cvstip, NOT with my
changes...please assist a tired brain

My errors:
[sriggs@stromboli postmaster]$ make
gcc -O2 -fno-strict-aliasing -Wall -Wmissing-prototypes
-Wmissing-declarations -I../../../src/include -D_GNU_SOURCE   -c -o
postmaster.o postmaster.c
postmaster.c: In function `ServerLoop':
postmaster.c:1046: error: storage size of `tz' isn't known
postmaster.c:1048: warning: implicit declaration of function
`gettimeofday'
postmaster.c:1046: warning: unused variable `tz'
postmaster.c: In function `BackendRun':
postmaster.c:2388: error: storage size of `tz' isn't known
postmaster.c:2388: warning: unused variable `tz'
make: *** [postmaster.o] Error 1
[sriggs@stromboli postmaster]$

These are the only errors from make...
    port.h
    postmaster.c
have both just been updated by cvs update

...for the last few days, I've been getting this when I attempt initdb
with -d option enabled

LOG:  could not open directory "/share/timezone": No such file or
directory
DEBUG:  Reject TZ "GMT0BST": at 1111968000 2005-03-28 00:00:00 std
versus 2005-03-28 01:00:00 dst
DEBUG:  Reject TZ "GMT0": at 1088121600 2004-06-25 00:00:00 std versus
2004-06-25 01:00:00 dst
LOG:  could not recognize system timezone, defaulting to "Etc/GMT0"
HINT:  You can specify the correct timezone in postgresql.conf.

...my system shows....

[sriggs@stromboli pgsql]$ date
Fri Jun 25 20:55:31 BST 2004

...is this a case of?

#ifdef BRIT
    gcc -throw-wobbly
#endif

Comments?

Regards, Simon Riggs


Re: tz error prevents postmaster.c compiling...

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Going backwards at a rate of knots, I can't even get postmaster to
> compile now.... and I'm talking about the one from cvstip, NOT with my

> postmaster.c:1046: error: storage size of `tz' isn't known
> postmaster.c:1048: warning: implicit declaration of function
> `gettimeofday'

You probably need to rerun configure.  It sounds like <sys/time.h>
isn't getting imported because HAVE_SYS_TIME_H isn't defined ...

> ...for the last few days, I've been getting this when I attempt initdb
> with -d option enabled

> LOG:  could not open directory "/share/timezone": No such file or
> directory

I am suspicious that this is a configuration problem too.  "make
distclean" and a full rebuild seems indicated.

            regards, tom lane

Re: tz error prevents postmaster.c compiling...

From
Simon Riggs
Date:
On Fri, 2004-06-25 at 21:36, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Going backwards at a rate of knots, I can't even get postmaster to
> > compile now.... and I'm talking about the one from cvstip, NOT with my
>
> > postmaster.c:1046: error: storage size of `tz' isn't known
> > postmaster.c:1048: warning: implicit declaration of function
> > `gettimeofday'
>
> You probably need to rerun configure.  It sounds like <sys/time.h>
> isn't getting imported because HAVE_SYS_TIME_H isn't defined ...
>
> > ...for the last few days, I've been getting this when I attempt initdb
> > with -d option enabled
>
> > LOG:  could not open directory "/share/timezone": No such file or
> > directory
>
> I am suspicious that this is a configuration problem too.  "make
> distclean" and a full rebuild seems indicated.
>

Hmmm...I already tried running configure again

Advice given...advice heeded:
  configure
  make distclean
  make

fails...same way...damn.

I'll try a full clean cvstip checkout and try again...

I've not gone anywhere near timezones...so I have bad vibes.

Regards, Simon Riggs