Thread: Re: [HACKERS] Point in Time Recovery

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Wed, 2004-07-14 at 20:33, Simon Riggs wrote:
> On Wed, 2004-07-14 at 16:55, markw@osdl.org wrote:
> > On 14 Jul, Simon Riggs wrote:
> > > PITR Patch v5_1 just posted has Point in Time Recovery working....
> > >
> > > Still some rough edges....but we really need some testers now to give
> > > this a try and let me know what you think.
> > >
> > > Klaus Naumann and Mark Wong are the only [non-committers] to have tried
> > > to run the code (and let me know about it), so please have a look at
> > > [PATCHES] and try it out.
> > >
>
> > I just tried applying the v5_1 patch against the cvs tip today and got a
> > couple of rejections.  I'll copy the patch output here.  Let me know if
> > you want to see the reject files or anything else:
> >
>
> I'm on it. Sorry 'bout that all - midnight fingers.

Latest version, pitr_v5_2.patch...

- Updated to cvs tip
- Additional tip changes located and patched
- Full re-test of both recover to point in time and recover to xid
- 2 additional bug fixes
- corrected recovery.conf sample
- Patch test
- Patch manually inspected

(pgarch.c, pgarch.h and README identical to previous post)

Go for it...

Best regards, Simon

Attachment

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
[ ... some desultory reading of PITR patch ... ]

What is the point of having both archive_program and archive_dest as
GUC variables?  Wouldn't it be simpler to fold them into one parameter,
viz

    archive_command = 'cp %s /archivedir'

For that matter, do we need a separate archive_mode boolean?  The one
thing I can positively guarantee about archive_dest (or archive_command)
is that we cannot come up with a useful default for it (no, /tmp isn't
good).  Therefore it does not seem very reasonable to let the user turn
on archiving without having explicitly specified an archive destination.

I propose that we fold all three GUC flags into a single archive_command
string whose built-in default is an empty string, and you enable
archiving by setting it to something nonempty.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> [ ... some desultory reading of PITR patch ... ]
>
> What is the point of having both archive_program and archive_dest as
> GUC variables?  Wouldn't it be simpler to fold them into one parameter,
> viz
>
>     archive_command = 'cp %s /archivedir'
>
> For that matter, do we need a separate archive_mode boolean?  The one
> thing I can positively guarantee about archive_dest (or archive_command)
> is that we cannot come up with a useful default for it (no, /tmp isn't
> good).  Therefore it does not seem very reasonable to let the user turn
> on archiving without having explicitly specified an archive destination.

I assume archive_dest is used for both archive and recovery of archives.

> I propose that we fold all three GUC flags into a single archive_command
> string whose built-in default is an empty string, and you enable
> archiving by setting it to something nonempty.

I think the idea is that you would turn archiving on and off regularly
while you might never change the archive_command value.  Also, how would
you disable it?  Set it to "", and if you do, you then have not way to
remember your command string when you want to re-enable it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> What is the point of having both archive_program and archive_dest as
>> GUC variables?

> I assume archive_dest is used for both archive and recovery of archives.

You assume wrong; it's not used there.  There isn't any real good
reason to suppose that the recovery process is going to fetch the files
from exactly where archiving put them, anyhow.

> I think the idea is that you would turn archiving on and off regularly

Why in the world would you do that?  People who want PITR at all will
want it 24x7.

> while you might never change the archive_command value.  Also, how would
> you disable it?  Set it to "", and if you do, you then have not way to
> remember your command string when you want to re-enable it.

Leave the original value in a comment, if you're going to want it again
later.

I don't think any of the above arguments outweigh the risk of people
shooting themselves in the foot by enabling archive_mode without
specifying a proper command/destination.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> What is the point of having both archive_program and archive_dest as
> >> GUC variables?
>
> > I assume archive_dest is used for both archive and recovery of archives.
>
> You assume wrong; it's not used there.  There isn't any real good
> reason to suppose that the recovery process is going to fetch the files
> from exactly where archiving put them, anyhow.
>
> > I think the idea is that you would turn archiving on and off regularly
>
> Why in the world would you do that?  People who want PITR at all will
> want it 24x7.
>
> > while you might never change the archive_command value.  Also, how would
> > you disable it?  Set it to "", and if you do, you then have not way to
> > remember your command string when you want to re-enable it.
>
> Leave the original value in a comment, if you're going to want it again
> later.
>
> I don't think any of the above arguments outweigh the risk of people
> shooting themselves in the foot by enabling archive_mode without
> specifying a proper command/destination.

So you want to merge them all into a single command string.  That does
seem less error-prone.  I see a few variables that turn off
when set to '' like unix_socket_*.  How would this command string work?
How do you specify the WAL file name to transfer?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> So you want to merge them all into a single command string.  That does
> seem less error-prone.  I see a few variables that turn off
> when set to '' like unix_socket_*.  How would this command string work?
> How do you specify the WAL file name to transfer?

No different from before, necessarily.  However I did not like the
restriction to a single %s in the submitted implementation.  What I
have in my local copy is
    %p -> full path of XLOG file to be archived
    %f -> base name of XLOG file to be archived
and the suggested example becomes
    archive_command = 'cp %p /mnt/server/pgarchive/%f'

Note that this example immediately eliminates one of the failure modes
Simon enumerates in his README, which is to try 'cp %s /foo' where /foo
isn't a directory.  More generally, though, *only* a cp-to-directory
solution is likely to be very happy with not being able to get at the
base file name.  Yes you can make a shellscript and use basename,
but I don't think you should have to do that if it could otherwise
be a one-liner.

(In case it's not obvious from the above, I am hacking with intent to
commit soon.  Maybe tomorrow, if my wife doesn't make me paint the
bathroom instead...)

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Sun, 2004-07-18 at 06:04, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So you want to merge them all into a single command string.  That does
> > seem less error-prone.  I see a few variables that turn off
> > when set to '' like unix_socket_*.  How would this command string work?
> > How do you specify the WAL file name to transfer?
>

GUC-wise, I implemented what we agreed in discussions...

There are many things in need of refactoring, so my focus was on
delivering what we agreed, even knowing it would probably change...

A few notes on the patch (as I submitted it - so as not to confuse with
other versions being worked upon)
- archive_dest is definitely used in both archive and recovery. There
wasn't much need for this GUC apart from that and I think we are better
off without it. Removing it improves recovery flexibility (we cannot
assume the recovery is taking place in anything like the original
configuration).

- archive_mode I would prefer to keep - it is explicit then which mode
you are in, rather than implicit from the command string. In all other
ways I agree with everything Tom has said. It allows us to talk about
"being in archive_mode" without people saying "but I can't work out how
to turn archive mode on".

When archiver starts the FIRST thing it does is run a test to confirm
that the command string works, so setting archive_command to '' would
simply generate an error.

Also, I would suggest this:
- changing archive mode requires a postmaster restart
- changing archive command should just be a SIGHUP...we don't want to
force a restart just to switch to a new kind of archiving

If you can only change archive_program at postmaster start that is
restrictive, but making that SIGHUP would allow people to set it to ''
and turn off archiving while postmaster is up == lurking fault.

> No different from before, necessarily.  However I did not like the
> restriction to a single %s in the submitted implementation.  What I
> have in my local copy is
>     %p -> full path of XLOG file to be archived
>     %f -> base name of XLOG file to be archived
> and the suggested example becomes
>     archive_command = 'cp %p /mnt/server/pgarchive/%f'
>

I'm happy with those changes and would have done them myself given
time... the 2 or 3 %s parameters wasn't the most user friendly way of
doing it.

> Note that this example immediately eliminates one of the failure modes
> Simon enumerates in his README, which is to try 'cp %s /foo' where /foo
> isn't a directory.  More generally, though, *only* a cp-to-directory
> solution is likely to be very happy with not being able to get at the
> base file name.  Yes you can make a shellscript and use basename,
> but I don't think you should have to do that if it could otherwise
> be a one-liner.
>

Good.

> (In case it's not obvious from the above, I am hacking with intent to
> commit soon.  Maybe tomorrow, if my wife doesn't make me paint the
> bathroom instead...)
>
...just returned from there... :)


Best Regards, Simon Riggs


Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Latest version, pitr_v5_2.patch...

Reviewed and committed with some adjustments.

I see the following significant loose ends:

* Documentation is, um, lacking.  (One point in particular is that I
inserted the recovery.conf.sample file into CVS, but did not fill in
the patch's lack of attempt to install it anywhere.)

* As Bruce has pointed out already, the process of making a backup
needs some improvements for more safety: the starting and ending WAL
offsets have got to be recorded somehow.

* As I have pointed out already, we need to invent "timelines" to
allow incompatible WAL segments to exist side-by-side.  I will volunteer
to look into this.

* I think creating a .ready file during XLogFileOpen is completely bogus,
for reasons mentioned in committed comments (look for XXX).  Possibly
this can go away with timelines.

* I am wondering if it wouldn't be a good idea to remove the local copy
of any segment we successfully obtain from archive.  The existing
comments note that we might get a wrong or corrupted file from archive,
but aren't we in at least as much risk of using an obsolete segment
restored from backup if we leave the local segment in place?  (The
archive recovery run itself will know not to do this, but if we crash
shortly thereafter, the ensuing recovery run would NOT know not to
trust such files.)

Perhaps the last point is really a backup-process issue.  AFAICS there
is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
all, and some good reasons for it not to.  Can we redesign either the
backup process or the disk layout so that that will not happen?  Then
we could stop worrying about stale local pg_xlog files.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> When archiver starts the FIRST thing it does is run a test to confirm
> that the command string works, so setting archive_command to '' would
> simply generate an error.

No, it would do no such thing; the test cannot really tell anything more
than whether system("foo") returns zero ... and at least on my machine,
system("") returns zero.  It certainly does not prove that any data went
to anyplace safe.

I diked that test out of the committed patch because I felt it cluttered
the archive area without actually proving anything of interest.  We can
revisit the point if you like.

> Also, I would suggest this:
> - changing archive mode requires a postmaster restart

Why?

> - changing archive command should just be a SIGHUP...

Check, as committed [and tested to work...]

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
What is the process of logging to tape?  Ideally we could just do 'dd'
to the tape drive in append mode;  however we need a way of signalling
that we want to change tapes.

The only method I can think of is to have PITR dump the files into a
holding directory, and have a daemon that scans the directory and writes
files to tape when they are completely copied (how do we detect that?
Use 'mv' after the copy?  Seems like a good use for our new %
parameters).  Then we need a control program to signal the daemon to
stop archiving to tape, have it set a flag file so we know it is
suspended tape writes, report that back to the client, change tapes,
then tell it to restart.

I am asking to make sure we don't need a PITR pause mode that prevents
WAL files from being archived but also prevents them from being
recycled.  If we did that, we could probably append to tape directly,
but then we need to go into 'pause archive" mode in the PITR process,
and such switching seems like a pain and the wrong place to do it.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So you want to merge them all into a single command string.  That does
> > seem less error-prone.  I see a few variables that turn off
> > when set to '' like unix_socket_*.  How would this command string work?
> > How do you specify the WAL file name to transfer?
>
> No different from before, necessarily.  However I did not like the
> restriction to a single %s in the submitted implementation.  What I
> have in my local copy is
>     %p -> full path of XLOG file to be archived
>     %f -> base name of XLOG file to be archived
> and the suggested example becomes
>     archive_command = 'cp %p /mnt/server/pgarchive/%f'
>
> Note that this example immediately eliminates one of the failure modes
> Simon enumerates in his README, which is to try 'cp %s /foo' where /foo
> isn't a directory.  More generally, though, *only* a cp-to-directory
> solution is likely to be very happy with not being able to get at the
> base file name.  Yes you can make a shellscript and use basename,
> but I don't think you should have to do that if it could otherwise
> be a one-liner.
>
> (In case it's not obvious from the above, I am hacking with intent to
> commit soon.  Maybe tomorrow, if my wife doesn't make me paint the
> bathroom instead...)
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Latest version, pitr_v5_2.patch...
>
> Reviewed and committed with some adjustments.
>
> I see the following significant loose ends:
>
> * Documentation is, um, lacking.  (One point in particular is that I
> inserted the recovery.conf.sample file into CVS, but did not fill in
> the patch's lack of attempt to install it anywhere.)

I figure it should go in share like the other sample files, and tell
people to copy it to /data and modify it for recovery.

> * As Bruce has pointed out already, the process of making a backup
> needs some improvements for more safety: the starting and ending WAL
> offsets have got to be recorded somehow.

Yep, we need those files in the archive location and the /data directory
tarball.

> * As I have pointed out already, we need to invent "timelines" to
> allow incompatible WAL segments to exist side-by-side.  I will volunteer
> to look into this.

Great.

> * I think creating a .ready file during XLogFileOpen is completely bogus,
> for reasons mentioned in committed comments (look for XXX).  Possibly
> this can go away with timelines.
>
> * I am wondering if it wouldn't be a good idea to remove the local copy
> of any segment we successfully obtain from archive.  The existing
> comments note that we might get a wrong or corrupted file from archive,
> but aren't we in at least as much risk of using an obsolete segment
> restored from backup if we leave the local segment in place?  (The
> archive recovery run itself will know not to do this, but if we crash
> shortly thereafter, the ensuing recovery run would NOT know not to
> trust such files.)

> Perhaps the last point is really a backup-process issue.  AFAICS there
> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
> all, and some good reasons for it not to.  Can we redesign either the
> backup process or the disk layout so that that will not happen?  Then
> we could stop worrying about stale local pg_xlog files.

Seems we should just clear out the /pg_xlog directory before we start
recovery.  We are going to rename recovery.conf to recovery.in-progress
or something to prevent us from clearing out the directory after a
crash, right?  (I see you rename recovery.conf to recovery.done.  Is
that wise?  I thought we would disable recovery after a crash, or does
it just keep going?  If so, nice.)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> * Documentation is, um, lacking.  (One point in particular is that I
>> inserted the recovery.conf.sample file into CVS, but did not fill in
>> the patch's lack of attempt to install it anywhere.)

> I figure it should go in share like the other sample files, and tell
> people to copy it to /data and modify it for recovery.

It should certainly go to /share as a .sample file.  I was thinking that
initdb should perhaps copy it into $PGDATA (still as .sample, not as
.conf!) so it'd be right there when you need it.

>> Perhaps the last point is really a backup-process issue.  AFAICS there
>> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
>> all, and some good reasons for it not to.

> Seems we should just clear out the /pg_xlog directory before we start
> recovery.

No, that's a horrid idea, because it loses the ability to combine
archival xlog files with recent files in /pg_xlog that are not yet
archived.  We need to distinguish old files that were accidentally
captured by backup from very-recent files.  I think the cleanest way to
do that is for backup not to capture them in the first place.

> We are going to rename recovery.conf to recovery.in-progress
> or something to prevent us from clearing out the directory after a
> crash, right?

I had second thoughts about that and didn't do it in the committed
patch, though it's certainly still open for debate.

> (I see you rename recovery.conf to recovery.done.  Is
> that wise?

Yes.  Once you've done with a PITR recovery you definitely do *not* want
a subsequent crash recovery to think it should obey your recovery_target
limit.  But if you fail before you've finished the recovery run it
should theoretically be okay to retry, so I didn't add code to rename to
"recovery.inprogress".  We can certainly add it later if we decide it's
a good idea.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What is the process of logging to tape?  Ideally we could just do 'dd'
> to the tape drive in append mode;  however we need a way of signalling
> that we want to change tapes.

The reason we use a user-specifiable shell command for archiving is
so that we do not have to answer the above ;-).  It's the user's problem
to write a shell script that does things the way he wants.  He can make
it connect to /dev/tty and ask the operator to swap tapes, or whatever.

Personally I am very accustomed to Hewlett-Packard's disk-to-tape backup
program "fbackup", which allows you to provide a shell script to handle
exactly this sort of thing, and it's worked well for me for many years.

> I am asking to make sure we don't need a PITR pause mode that prevents
> WAL files from being archived but also prevents them from being
> recycled.

WAL files will not be recycled until the archiver daemon has set a .done
flag file for them, so I see no problem here.  (Note: I took out some
code in Simon's original patch that would start bleating on the basis
of totally unsupportable assumptions about long archival of a log
segment "ought to" take.)

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Mon, 2004-07-19 at 04:13, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > When archiver starts the FIRST thing it does is run a test to confirm
> > that the command string works, so setting archive_command to '' would
> > simply generate an error.
>
> No, it would do no such thing; the test cannot really tell anything more
> than whether system("foo") returns zero ... and at least on my machine,
> system("") returns zero.  It certainly does not prove that any data went
> to anyplace safe.
>
> I diked that test out of the committed patch because I felt it cluttered
> the archive area without actually proving anything of interest.  We can
> revisit the point if you like.
>

If the test doesn't guarantee success, then it needs to go....

Thanks for removing it.

Best Regards, Simon Riggs


Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Mon, 2004-07-19 at 04:03, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Latest version, pitr_v5_2.patch...
>
> Reviewed and committed with some adjustments.
>

Wow! Thanks very much - you work fast.

I'll be re-testing later today.

> I see the following significant loose ends:
>
> * Documentation is, um, lacking.  (One point in particular is that I
> inserted the recovery.conf.sample file into CVS, but did not fill in
> the patch's lack of attempt to install it anywhere.)
>

Yes...wasn't sure what to do with that. Is everybody happy to install it
as a sample into the main Data Directory? (i.e. as recovery.conf.sample
rather than recovery.conf which would be a bad thing).

> * As Bruce has pointed out already, the process of making a backup
> needs some improvements for more safety: the starting and ending WAL
> offsets have got to be recorded somehow.
>

Haven't got to that yet, but will do.

> * As I have pointed out already, we need to invent "timelines" to
> allow incompatible WAL segments to exist side-by-side.  I will volunteer
> to look into this.

Yes, discussing on the other thread.

>
> * I think creating a .ready file during XLogFileOpen is completely bogus,
> for reasons mentioned in committed comments (look for XXX).  Possibly
> this can go away with timelines.

Yes, to some extent it would go away with timelines.

If you have a local copy at the end of a timeline that isn't archived,
then it seems a good idea to archive it, or at least copy it somewhere
safe. If you don't then you will not be able to revert to a full
recovery of that timeline in the future should you choose to do so.

The code and its location may be somewhat more suspect.... :)

>
> * I am wondering if it wouldn't be a good idea to remove the local copy
> of any segment we successfully obtain from archive.  The existing
> comments note that we might get a wrong or corrupted file from archive,
> but aren't we in at least as much risk of using an obsolete segment
> restored from backup if we leave the local segment in place?  (The
> archive recovery run itself will know not to do this, but if we crash
> shortly thereafter, the ensuing recovery run would NOT know not to
> trust such files.)
>

I agree they're a loose end that needs some thought.

I avoided that decision by going around the files. We originally agreed
that we would keep that data....reason was you can't tell whether the
files have been restored by a backup that forgot to exclude pg_xlog, or
that we are choosing to do a PITR recovery on an otherwise healthy
system (or as the comments explain maybe we lost everything except
pg_xlog).

If we crash during recovery it doesn't crash recover and restart.

If we crash after recovery, then the checkpoint record will have moved
forward and we so we don't then accidentally re-use those local copies.

Timelines will solve this...
>
> Perhaps the last point is really a backup-process issue.  AFAICS there
> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
> all, and some good reasons for it not to.  Can we redesign either the
> backup process or the disk layout so that that will not happen?  Then
> we could stop worrying about stale local pg_xlog files.
>

Thats the way I saw it.

Seems fairly easy to say "don't backup pg_xlog", but you can't guarantee
they won't, even if you tell them not to...

What is stale today maybe considered to be actually your best option
when testing to see whether a recovery has achieved your objectives.


I'll read the who patch, your comments and test before I respond
further. Thanks for working so hard on this, so quickly.

Best Regards, Simon Riggs



Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> * Documentation is, um, lacking.  (One point in particular is that I
> >> inserted the recovery.conf.sample file into CVS, but did not fill in
> >> the patch's lack of attempt to install it anywhere.)
>
> > I figure it should go in share like the other sample files, and tell
> > people to copy it to /data and modify it for recovery.
>
> It should certainly go to /share as a .sample file.  I was thinking that
> initdb should perhaps copy it into $PGDATA (still as .sample, not as
> .conf!) so it'd be right there when you need it.

I think /share is best.  I see other *.share file that aren't used until
you rename them and move them to the right directory, and
recovery.conf.sample seems the same.  I think having the sample at the
top of data when for most people it will be unused is strange.

> >> Perhaps the last point is really a backup-process issue.  AFAICS there
> >> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
> >> all, and some good reasons for it not to.
>
> > Seems we should just clear out the /pg_xlog directory before we start
> > recovery.
>
> No, that's a horrid idea, because it loses the ability to combine
> archival xlog files with recent files in /pg_xlog that are not yet
> archived.  We need to distinguish old files that were accidentally
> captured by backup from very-recent files.  I think the cleanest way to
> do that is for backup not to capture them in the first place.

I am confused.  Aren't we always doing a restore from a backup?  Are you
saying there are cases where we aren't and need the stuff in pg_xlog?
Are you saying we might have some new WAL files that we want to add to
pg_xlog before we do the restore, like the most recent WAL that wasn't
archived because it wasn't finished?  Why would we be doing a recover if
we had such files?  I see your point that we wouldn't know which file
to use, the archive version or the pg_xlog version, but actually
wouldn't the archive version always be preferred because we would know
it to be complete.

I don't see any reliable way to prevent people from having pg_xlog in
their backups seeing they might use snapshots, tar, etc.

> > We are going to rename recovery.conf to recovery.in-progress
> > or something to prevent us from clearing out the directory after a
> > crash, right?
>
> I had second thoughts about that and didn't do it in the committed
> patch, though it's certainly still open for debate.

How are we handling a crash during recovery?

> > (I see you rename recovery.conf to recovery.done.  Is
> > that wise?
>
> Yes.  Once you've done with a PITR recovery you definitely do *not* want
> a subsequent crash recovery to think it should obey your recovery_target
> limit.  But if you fail before you've finished the recovery run it
> should theoretically be okay to retry, so I didn't add code to rename to
> "recovery.inprogress".  We can certainly add it later if we decide it's
> a good idea.

Ah, OK, so it just keeps going.  However, we don't know if what is in
pg_xlog was in the process of being copied from the archive at the time
of the crash, no?  In fact I am wondering if we should be transfering
the archive files into temporary names than doing an 'mv' to make them
current so we don't get partial files in pg_xlog.  However, we can't do
that because we are using a user-supplied command line.  Should we pass
a fake name to the command string then do the 'mv' ourselves.  With WAL
now, we do an fsync so we know the contents are crash-proof, but I am
not sure how to do that during recovery.  I guess this gets back to how
to handle the contents of pg_xlog during recovery.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> It should certainly go to /share as a .sample file.  I was thinking that
>> initdb should perhaps copy it into $PGDATA (still as .sample, not as
>> .conf!) so it'd be right there when you need it.

> I think /share is best.

Okay, we agree on that part at least; I'll take care of it.  If anyone
wants to argue for further copying during initdb, that can be added
later.

> I am confused.  Aren't we always doing a restore from a backup?

No.  This code serves two purposes: recovery from archived WAL and
point-in-time recovery.  You might want to do a PITR run at a time
where not all your WAL segments have been pushed to archive.  Indeed
the latest one can never be so pushed, since it's unfinished.  Suppose
you are trying to do PITR recovery to a time just a few minutes ago
that is still in the latest WAL segment --- there is simply not any
legal way to have that come from the archive.

So we can't simply zero out pg_xlog at the start of a PITR run, even
if there weren't a don't-destroy-data argument against it.

>> I had second thoughts about that and didn't do it in the committed
>> patch, though it's certainly still open for debate.

> How are we handling a crash during recovery?

Retry, perhaps.  It doesn't seem any different from crash-during-recovery
in the non-archived scenario ...

> Ah, OK, so it just keeps going.  However, we don't know if what is in
> pg_xlog was in the process of being copied from the archive at the time
> of the crash, no?

Nonissue.  It goes into RECOVERYXLOG and we never assume that that's
initially good.  See RestoreArchivedXLog().

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> It should certainly go to /share as a .sample file.  I was thinking that
> >> initdb should perhaps copy it into $PGDATA (still as .sample, not as
> >> .conf!) so it'd be right there when you need it.
>
> > I think /share is best.
>
> Okay, we agree on that part at least; I'll take care of it.  If anyone
> wants to argue for further copying during initdb, that can be added
> later.
>
> > I am confused.  Aren't we always doing a restore from a backup?
>
> No.  This code serves two purposes: recovery from archived WAL and
> point-in-time recovery.  You might want to do a PITR run at a time
> where not all your WAL segments have been pushed to archive.  Indeed
> the latest one can never be so pushed, since it's unfinished.  Suppose
> you are trying to do PITR recovery to a time just a few minutes ago
> that is still in the latest WAL segment --- there is simply not any
> legal way to have that come from the archive.
>
> So we can't simply zero out pg_xlog at the start of a PITR run, even
> if there weren't a don't-destroy-data argument against it.

If we had some code that checks pg_xlog on recovery startup, it could
rename each pg_xlog file and then recover the file from the archive.  If
it doesn't exist or is truncated, discard it.  If it is the right size,
we need to check to see which one has a WAL eof-of-segment marker (we
have on of those, right?).  This would seem to catch all the cases:

    o  file brought back by tar, but complete file in archive
    o  archive in process of writing during crash
    o  partially full file in pg_xlog

What it doesn't cover are cases where tar gets a partial copy of a
pg_xlog file but the file never made it to archive yet, and a new
pg_xlog file was created and we get some of that file too.  In fact, the
backup could get holes in the pg_xlog file where the backup has zeros
but the real file had data added to it after the zeros:

in tar    XXXXX  00000 XXXXX

real    XXXXX  XXXXX XXXXX

This could happen when file has this:

    XXXXX  00000 00000

backup reads this:

    XXXXX  00000

database writes this:

    XXXXX  XXXXX XXXXX

backup reads the remainder of the file:

    XXXXX  00000 XXXXX

In this case the end-of-segment marker doesn't even help us, and their
might not be an archive copy of this because it didn't happen yet.

I think I see a solution. We are going to create a file during backup so
we know the wal offsets and xids.  If we see that file, we know either
we have a restore of a backup or they currently running a backup.  If we
tell them not to restore while a backup is running (seems pretty
obvious) we can then delete pg_xlog when the backup wal offset file
exists.  In other cases, we know the WAL files are valid to use.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Mon, 2004-07-19 at 05:54, Tom Lane wrote:
> code in Simon's original patch that would start bleating

Code that bleats? LOL :) (is that a new log level?)

Some of it was perhaps a little woolly....

You've made my day, Simon Riggs (still laughing)


Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> we need to check to see which one has a WAL eof-of-segment marker (we
> have on of those, right?).

No, we don't.

> I think I see a solution. We are going to create a file during backup so
> we know the wal offsets and xids.  If we see that file, we know either
> we have a restore of a backup or they currently running a backup.

... or the last backup attempt failed, but they forgot to remove the
file it left.  Or we are doing crash recovery after the system lost
power while a backup was running.  Or half a dozen other obvious scenarios.

> If we tell them not to restore while a backup is running (seems pretty
> obvious) we can then delete pg_xlog when the backup wal offset file
> exists.  In other cases, we know the WAL files are valid to use.

We're not deleting pg_xlog, period.  IMHO it's too dangerous even to
have such a function in the code.

My original suggestion was to *replace* individual xlog files with data
extracted from archive, and only after determining that the archive
indeed has a copy of that particular file (and we can fetch it).
This at least has a fighting chance of not losing information.  Wiping
pg_xlog in toto on the basis of a guess about the system status is just
a form of russian roulette.  Sooner or later you will wipe some xlog
files that you can't get back from archive.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > we need to check to see which one has a WAL eof-of-segment marker (we
> > have on of those, right?).
>
> No, we don't.
>
> > I think I see a solution. We are going to create a file during backup so
> > we know the wal offsets and xids.  If we see that file, we know either
> > we have a restore of a backup or they currently running a backup.
>
> ... or the last backup attempt failed, but they forgot to remove the
> file it left.  Or we are doing crash recovery after the system lost
> power while a backup was running.  Or half a dozen other obvious scenarios.
>
> > If we tell them not to restore while a backup is running (seems pretty
> > obvious) we can then delete pg_xlog when the backup wal offset file
> > exists.  In other cases, we know the WAL files are valid to use.
>
> We're not deleting pg_xlog, period.  IMHO it's too dangerous even to
> have such a function in the code.
>
> My original suggestion was to *replace* individual xlog files with data
> extracted from archive, and only after determining that the archive
> indeed has a copy of that particular file (and we can fetch it).
> This at least has a fighting chance of not losing information.  Wiping
> pg_xlog in toto on the basis of a guess about the system status is just
> a form of russian roulette.  Sooner or later you will wipe some xlog
> files that you can't get back from archive.

OK, if you don't want to place restrictions on recovery, fine, but how
do you handle the situation where you backup but the WAL file has holes
in the tar backup but you don't have an archive file to use because it
didn't make it to the archive before the drive died?  Can we detect
holes in the WAL file recovered from backup?  We might, but I am afraid
we might not.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Mon, 2004-07-19 at 17:56, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I had second thoughts about that and didn't do it in the committed
> >> patch, though it's certainly still open for debate.
>
> > How are we handling a crash during recovery?
>
> Retry, perhaps.  It doesn't seem any different from crash-during-recovery
> in the non-archived scenario ...
>

Well, a recovery is just re-applying already written logs at super
speed. We don't need to write WAL because we already wrote it once (and
that would really confuse the timeline issue).

I think if this was an issue, the solution would be to speed up recovery
since that would benefit us more than putting recovery-squared code in.

Just start over...

Best Regards, Simon Riggs


Re: [HACKERS] Point in Time Recovery

From
Christopher Kings-Lynne
Date:
> Okay, we agree on that part at least; I'll take care of it.  If anyone
> wants to argue for further copying during initdb, that can be added
> later.

I reckon it should be copied into $PGDATA :)  Otherwise, when I'm in a
panic at recovery time, I'd have to figure out where the heck my package
has installed the share conf file to, conf files usually aren't in
share, etc., etc.

Chris


Re: [HACKERS] Point in Time Recovery

From
markw@osdl.org
Date:
On 18 Jul, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> Latest version, pitr_v5_2.patch...
>
> Reviewed and committed with some adjustments.

I pull from CVS and and got the following message when I tried starting
the database with the archive_mode parameter:

FATAL:  unrecognized configuration parameter "archive_mode"

Have I missed something since it has been committed?

Mark

Re: [HACKERS] Point in Time Recovery

From
Klaus Naumann
Date:
On Tue, 20 Jul 2004 markw@osdl.org wrote:

> FATAL:  unrecognized configuration parameter "archive_mode"
>
> Have I missed something since it has been committed?

Yes, Tom has removed this option in favorite of just setting
archive_command to a value which then enables the PITR code also.

But as I've seen this isn't discussed to the very end currently.

My 2ct: I'd prefer to have archive_mode in the config as it really makes
clear that this database is archiving. I fear users will not understand
that giving a program for archival will also enable the PITR function.

Greetings, Klaus


--
Full Name   : Klaus Naumann     | (http://www.mgnet.de/) (Germany)
Phone / FAX : ++49/177/7862964  | E-Mail: (kn@mgnet.de)

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Tue, 2004-07-20 at 17:29, Klaus Naumann wrote:
> On Tue, 20 Jul 2004 markw@osdl.org wrote:
>
> > FATAL:  unrecognized configuration parameter "archive_mode"
> >
> > Have I missed something since it has been committed?
>
> Yes, Tom has removed this option in favorite of just setting
> archive_command to a value which then enables the PITR code also.
>
> But as I've seen this isn't discussed to the very end currently.
>
> My 2ct: I'd prefer to have archive_mode in the config as it really makes
> clear that this database is archiving. I fear users will not understand
> that giving a program for archival will also enable the PITR function.
>

I do also think that option should go back in, just to be explicit.

A more important omission is the deletion of a message to indicate that
the server is acting in archive_mode....so there's no visual clue in the
log to warn an admin that its been turned off now or incorrectly
specified (by somebody else, of course). (At least using the default log
mode).

Best Regards, Simon Riggs


Re: [HACKERS] Point in Time Recovery

From
Mark Kirkwood
Date:
I'd vote for it as a clarity factor too.

Klaus Naumann wrote:

>On Tue, 20 Jul 2004 markw@osdl.org wrote:
>
>
>
>>FATAL:  unrecognized configuration parameter "archive_mode"
>>
>>Have I missed something since it has been committed?
>>
>>
>
>Yes, Tom has removed this option in favorite of just setting
>archive_command to a value which then enables the PITR code also.
>
>But as I've seen this isn't discussed to the very end currently.
>
>My 2ct: I'd prefer to have archive_mode in the config as it really makes
>clear that this database is archiving. I fear users will not understand
>that giving a program for archival will also enable the PITR function.
>
>Greetings, Klaus
>
>
>
>

Re: [HACKERS] Point in Time Recovery

From
Christopher Kings-Lynne
Date:
I'm in favour of how it is now, so long as the comment is clear.  It's
the Unix Way :)

Chris

> I'd vote for it as a clarity factor too.
>
> Klaus Naumann wrote:
>
>> On Tue, 20 Jul 2004 markw@osdl.org wrote:
>>
>>
>>
>>> FATAL:  unrecognized configuration parameter "archive_mode"
>>>
>>> Have I missed something since it has been committed?
>>>
>>
>>
>> Yes, Tom has removed this option in favorite of just setting
>> archive_command to a value which then enables the PITR code also.
>>
>> But as I've seen this isn't discussed to the very end currently.
>>
>> My 2ct: I'd prefer to have archive_mode in the config as it really makes
>> clear that this database is archiving. I fear users will not understand
>> that giving a program for archival will also enable the PITR function.
>>
>> Greetings, Klaus
>>
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> A more important omission is the deletion of a message to indicate that
> the server is acting in archive_mode....so there's no visual clue in the
> log to warn an admin that its been turned off now or incorrectly
> specified (by somebody else, of course). (At least using the default log
> mode).

Hmm, we are apparently not reading the same code.  My copy shows

LOG:  starting archive recovery
LOG:  restore_command = "cp /home/postgres/testversion/archive/%f %p"
... blah blah ...
LOG:  archive recovery complete

Which part of this is insufficiently clear?

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Klaus Naumann
Date:
On Wed, 21 Jul 2004, Tom Lane wrote:

Hi Tom,

Simon doesn't mean the recovery part. Instead he means the "normal"
startup of the server. It has to be absolutely clear (in the logfile!) if
the server was started in archive mode or not. Otherwise you always have
to guess.
On server startup there should to be a message like

LOG: Database started in archive mode

or

LOG: Archive mode is DISABLED

To get the users attention.

Greetings, Klaus





> Simon Riggs <simon@2ndquadrant.com> writes:
> > A more important omission is the deletion of a message to indicate that
> > the server is acting in archive_mode....so there's no visual clue in the
> > log to warn an admin that its been turned off now or incorrectly
> > specified (by somebody else, of course). (At least using the default log
> > mode).
>
> Hmm, we are apparently not reading the same code.  My copy shows
>
> LOG:  starting archive recovery
> LOG:  restore_command = "cp /home/postgres/testversion/archive/%f %p"
> ... blah blah ...
> LOG:  archive recovery complete
>
> Which part of this is insufficiently clear?
>
>             regards, tom lane
>
>

--
Full Name   : Klaus Naumann     | (http://www.mgnet.de/) (Germany)
Phone / FAX : ++49/177/7862964  | E-Mail: (kn@mgnet.de)

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Klaus Naumann <kn@mgnet.de> writes:
> Simon doesn't mean the recovery part. Instead he means the "normal"
> startup of the server. It has to be absolutely clear (in the logfile!) if
> the server was started in archive mode or not. Otherwise you always have
> to guess.

Why would you guess?  "SHOW archive_command" will tell you, without
question, at any time.  I don't see the point of placing such a message
in the postmaster log --- in normal circumstances the postmaster will
still be running long after its starting messages have been discarded
due to log rotation.

Also, the current implementation allows you to stop and start archiving
on-the-fly, so a start-time message would be an unreliable guide to what
the postmaster is actually doing at the moment.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Simon Riggs
Date:
On Wed, 2004-07-21 at 15:53, Tom Lane wrote:
> Klaus Naumann <kn@mgnet.de> writes:
> > Simon doesn't mean the recovery part. Instead he means the "normal"
> > startup of the server. It has to be absolutely clear (in the logfile!) if
> > the server was started in archive mode or not. Otherwise you always have
> > to guess.
>
> Why would you guess?  "SHOW archive_command" will tell you, without
> question, at any time.  I don't see the point of placing such a message
> in the postmaster log --- in normal circumstances the postmaster will
> still be running long after its starting messages have been discarded
> due to log rotation.
>
> Also, the current implementation allows you to stop and start archiving
> on-the-fly, so a start-time message would be an unreliable guide to what
> the postmaster is actually doing at the moment.
>

Overall, this is a small point and I think we should leave Tom alone, to
focus on the bigger issues that we care about.

Tom has done an amazingly good job in the last few days of refactoring
some reasonably ugly code on my part, all without a murmur. I relent on
this to allow everything to be finished in time.

The PITR journey has just begun, so there will be further opportunity to
discuss and agree what constitutes real issues and then correct them.
This may not be on that list later.

Best Regards, Simon Riggs



Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
I do think we need a boolean for start/stop of archiving, rather than
setting it to '' to turn it off.  Tom, I think the group agreed to this
on clarity grounds.  I would like the server to throw an error if you
try to turn on archiving and the command is set to ''.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Wed, 2004-07-21 at 15:53, Tom Lane wrote:
> > Klaus Naumann <kn@mgnet.de> writes:
> > > Simon doesn't mean the recovery part. Instead he means the "normal"
> > > startup of the server. It has to be absolutely clear (in the logfile!) if
> > > the server was started in archive mode or not. Otherwise you always have
> > > to guess.
> >
> > Why would you guess?  "SHOW archive_command" will tell you, without
> > question, at any time.  I don't see the point of placing such a message
> > in the postmaster log --- in normal circumstances the postmaster will
> > still be running long after its starting messages have been discarded
> > due to log rotation.
> >
> > Also, the current implementation allows you to stop and start archiving
> > on-the-fly, so a start-time message would be an unreliable guide to what
> > the postmaster is actually doing at the moment.
> >
>
> Overall, this is a small point and I think we should leave Tom alone, to
> focus on the bigger issues that we care about.
>
> Tom has done an amazingly good job in the last few days of refactoring
> some reasonably ugly code on my part, all without a murmur. I relent on
> this to allow everything to be finished in time.
>
> The PITR journey has just begun, so there will be further opportunity to
> discuss and agree what constitutes real issues and then correct them.
> This may not be on that list later.
>
> Best Regards, Simon Riggs
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I do think we need a boolean for start/stop of archiving, rather than
> setting it to '' to turn it off.  Tom, I think the group agreed to this
> on clarity grounds.

I didn't see any consensus there, nor do I see a point to it.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I do think we need a boolean for start/stop of archiving, rather than
> > setting it to '' to turn it off.  Tom, I think the group agreed to this
> > on clarity grounds.
>
> I didn't see any consensus there, nor do I see a point to it.

I saw a lot of people saying it was a good idea, and only you saying it
was a bad idea.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Bruce Momjian wrote:
>
> I do think we need a boolean for start/stop of archiving, rather than
> setting it to '' to turn it off.  Tom, I think the group agreed to this
> on clarity grounds.  I would like the server to throw an error if you
> try to turn on archiving and the command is set to ''.

Let me illustrate.  To turn off archiving you have to change:

    #archive_command = ''
    archive_command = 'cp %p /mnt/server/archivedir/%f'

to

    archive_command = ''
    #archive_command = 'cp %p /mnt/server/archivedir/%f'

and if you comment both or neither, you have problems.

With a boolean it would be:

    archive_mode = on
    archive_command = 'cp %p /mnt/server/archivedir/%f'

    archive_mode = off
    archive_command = 'cp %p /mnt/server/archivedir/%f'

Now, if you say people will rarely turn archiving on/off, then one
parameter seems to make more sense.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Now, if you say people will rarely turn archiving on/off, then one
> parameter seems to make more sense.

I really can't envision a situation where people would do that.  If you
need PITR at all then you need it 24x7.

            regards, tom lane

Re: [HACKERS] Point in Time Recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Now, if you say people will rarely turn archiving on/off, then one
> > parameter seems to make more sense.
>
> I really can't envision a situation where people would do that.  If you
> need PITR at all then you need it 24x7.

OK, then we are OK.  If we find that isn't true, we can reevaluate.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Point in Time Recovery

From
"Simon@2ndquadrant.com"
Date:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Now, if you say people will rarely turn archiving on/off, then one
> > > parameter seems to make more sense.
> >
> > I really can't envision a situation where people would do that.  If you
> > need PITR at all then you need it 24x7.
>
I agree. The second parameter is only there to clarify the intent.

8.0 does introduce two good reasons to turn it on/off, however:
- index build speedups
- COPY speedups

I would opt to make enabling/disabling archive_command require a postmaster
restart. That way there would be no capability to take advantage of the
incentive to turn it on/off.

For TODO:

It would be my intention (in 8.1) to make those available via switches e.g.
NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
advantage of the no logging optimization without turning off PITR system
wide. (Just as this is possible in Oracle and Teradata).

I would also aim to make the first Insert Select into an empty table not
logged (optionally). This is an important optimization for Oracle, teradata
and DB2 (which uses NOT LOGGED INITIALLY).

Best Regards, Simon Riggs


Re: [HACKERS] Point in Time Recovery

From
Tom Lane
Date:
"Simon@2ndquadrant.com" <simon@2ndquadrant.com> writes:
> I would opt to make enabling/disabling archive_command require a postmaster
> restart. That way there would be no capability to take advantage of the
> incentive to turn it on/off.

We're generally not in the habit of making GUC parameters more rigid
than the implementation absolutely requires.

> It would be my intention (in 8.1) to make those available via switches e.g.
> NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
> advantage of the no logging optimization without turning off PITR system
> wide. (Just as this is possible in Oracle and Teradata).

Isn't this in direct conflict with your opinion above?  And I cannot say
that I think this one is a good idea.  We do not have support for
selective catalog xlogging; if you do something like this then you
*will* have a broken database after recovery, because it will contain
those indexes but with invalid contents.

> I would also aim to make the first Insert Select into an empty table not
> logged (optionally). This is an important optimization for Oracle, teradata
> and DB2 (which uses NOT LOGGED INITIALLY).

This is even worse: not only do you have a broken database, but you have
no way to recover.  (At least with an unlogged index you could fix it by
REINDEX.)  If you don't care about longevity of the table, then make it
a temp table.

The fact that Oracle does it does not automatically make it a good idea.

            regards, tom lane