Thread: XLogArchivingActive

XLogArchivingActive

From
Andreas Pflug
Date:
Currently, WAL files will be archived as soon as archive_command is set.  IMHO, this is not desirable if no permanent
backupis wanted, but only 
 
scheduled online backup because; it will flood the wal_archive 
destination with files that will never be used.

I propose to introduce a GUC "permanent_archiving" or so, to select 
whether wal archiving happens permanently or only when a backup is in 
progress (i.e. between pg_start_backup and pg_stop_backup).


Regards,
Andreas


Re: XLogArchivingActive

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I propose to introduce a GUC "permanent_archiving" or so, to select 
> whether wal archiving happens permanently or only when a backup is in 
> progress (i.e. between pg_start_backup and pg_stop_backup).

This is silly.  Why not just turn archiving on and off?
        regards, tom lane


Re: XLogArchivingActive

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>I propose to introduce a GUC "permanent_archiving" or so, to select 
>>whether wal archiving happens permanently or only when a backup is in 
>>progress (i.e. between pg_start_backup and pg_stop_backup).
> 
> 
> This is silly.  Why not just turn archiving on and off?

Not quite. I want online backup, but no archiving. Currently, I have to 
edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
(hopefully) writable directory, do the backup, edit postgresql.conf and 
SIGHUP again. Not too convenient...

Regards,
Andreas


Re: XLogArchivingActive

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Tom Lane wrote:
>> This is silly.  Why not just turn archiving on and off?

> Not quite. I want online backup, but no archiving. Currently, I have to 
> edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
> (hopefully) writable directory, do the backup, edit postgresql.conf and 
> SIGHUP again. Not too convenient...

You don't get to count the edit/SIGHUP steps, because those would be the
same for any other GUC.

AFAICS you could get the effect by setting up an archive_command scriptsleep 100exit 1
so that the archiver will do nothing.

BTW, I don't actually understand why you want this at all.  If you're
not going to keep a continuing series of WAL files, you don't have any
PITR capability.  What you're proposing seems like a bulky, unportable,
hard-to-use equivalent of pg_dump.  Why not use pg_dump?
        regards, tom lane


Re: XLogArchivingActive

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>Tom Lane wrote:
>>
>>>This is silly.  Why not just turn archiving on and off?
> 
> 
>>Not quite. I want online backup, but no archiving. Currently, I have to 
>>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
>>(hopefully) writable directory, do the backup, edit postgresql.conf and 
>>SIGHUP again. Not too convenient...
> 
> 
> You don't get to count the edit/SIGHUP steps, because those would be the
> same for any other GUC.

That's right, but my proposal would implicitely switch on archiving 
while backup is in progress, thus explicitely enabling/disabling 
archiving wouldn't be necessary.
> 
> AFAICS you could get the effect by setting up an archive_command script
>     sleep 100
>     exit 1
> so that the archiver will do nothing.

Doesn't WAL expect the WAL files already archived to be recyclable, so 
they could get overwritten in the pg_xlog dir while backup is running? 
Additionally, the doc recommends omitting pg_xlog from the file level 
backup, so a restart would need the archived wal files, no?


> 
> BTW, I don't actually understand why you want this at all.  If you're
> not going to keep a continuing series of WAL files, you don't have any
> PITR capability.  What you're proposing seems like a bulky, unportable,
> hard-to-use equivalent of pg_dump.  Why not use pg_dump?

Because pg_dump will take too long and create bloated dump files. All I 
need is a physical backup for disaster recovery purposes without 
bringing down the server.

In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB 
when pg_dumped, too much for the available backup capacity (esp. 
compared to net content, about 290GB). See other post "inefficient bytea 
escaping" for details.

Regards,
Andreas


Re: XLogArchivingActive

From
Jim Nasby
Date:
On May 25, 2006, at 11:24 AM, Andreas Pflug wrote:
>> BTW, I don't actually understand why you want this at all.  If you're
>> not going to keep a continuing series of WAL files, you don't have  
>> any
>> PITR capability.  What you're proposing seems like a bulky,  
>> unportable,
>> hard-to-use equivalent of pg_dump.  Why not use pg_dump?
>
> Because pg_dump will take too long and create bloated dump files.  
> All I need is a physical backup for disaster recovery purposes  
> without bringing down the server.
>
> In my case, I'd expect a DB that uses 114GB on disk to consume  
> 1.4TB when pg_dumped, too much for the available backup capacity  
> (esp. compared to net content, about 290GB). See other post  
> "inefficient bytea escaping" for details.

Another consideration is that you can use rsync to update a  
filesystem-level backup, but there's no pg_dump equivalent. On a  
large database that can make a sizable difference in the amount of  
time required for a backup.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461




Re: XLogArchivingActive

From
Andreas Pflug
Date:
Jim Nasby wrote:
> On May 25, 2006, at 11:24 AM, Andreas Pflug wrote:
>>> BTW, I don't actually understand why you want this at all.  If you're
>>> not going to keep a continuing series of WAL files, you don't have any
>>> PITR capability.  What you're proposing seems like a bulky, unportable,
>>> hard-to-use equivalent of pg_dump.  Why not use pg_dump?
>>
>> Because pg_dump will take too long and create bloated dump files. All 
>> I need is a physical backup for disaster recovery purposes without 
>> bringing down the server.
>>
>> In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB 
>> when pg_dumped, too much for the available backup capacity (esp. 
>> compared to net content, about 290GB). See other post "inefficient 
>> bytea escaping" for details.
>
> Another consideration is that you can use rsync to update a 
> filesystem-level backup, but there's no pg_dump equivalent. On a large 
> database that can make a sizable difference in the amount of time 
> required for a backup.
That's fine to cut the backup execution time, but to guarantee 
consistency while the cluster is running pg_start_backup/pg_stop_backup 
and WAL archiving will still be necessary.

Regards,
Andreas



Re: XLogArchivingActive

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> That's right, but my proposal would implicitely switch on archiving 
> while backup is in progress, thus explicitely enabling/disabling 
> archiving wouldn't be necessary.

I'm not sure you can expect that to work.  The system is not built to
guarantee instantaneous response to mode changes like that.

>> BTW, I don't actually understand why you want this at all.  If you're
>> not going to keep a continuing series of WAL files, you don't have any
>> PITR capability.  What you're proposing seems like a bulky, unportable,
>> hard-to-use equivalent of pg_dump.  Why not use pg_dump?

> Because pg_dump will take too long and create bloated dump files. All I 
> need is a physical backup for disaster recovery purposes without 
> bringing down the server.

> In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB 
> when pg_dumped, too much for the available backup capacity (esp. 
> compared to net content, about 290GB). See other post "inefficient bytea 
> escaping" for details.

The conventional wisdom is that pg_dump files are substantially smaller
than the on-disk footprint ... and that's even without compressing them.
I think you are taking a corner case, ie bytea data, and presenting it
as something that ought to be the design center.

Something that might be worth considering is an option to allow pg_dump
to use binary COPY.  I don't think this'd work nicely for text dumps,
but seems like custom- or tar-format dumps could be made to use it.
This would probably be a win for many datatypes not only bytea, and it'd
still be far more portable than a filesystem dump.
        regards, tom lane


Re: XLogArchivingActive

From
Simon Riggs
Date:
On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:
> Tom Lane wrote:
> > Andreas Pflug <pgadmin@pse-consulting.de> writes:
> > 
> >>I propose to introduce a GUC "permanent_archiving" or so, to select 
> >>whether wal archiving happens permanently or only when a backup is in 
> >>progress (i.e. between pg_start_backup and pg_stop_backup).
> > 
> > 
> > This is silly.  Why not just turn archiving on and off?
> 
> Not quite. I want online backup, but no archiving. 

I can see what you want and why you want it. It's good to have the
option of a physical online backup as opposed to the logical online
backup that pg_dump offers.

> Currently, I have to 
> edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
> (hopefully) writable directory, do the backup, edit postgresql.conf and 
> SIGHUP again. Not too convenient...

You're doing this for pgAdmin right?

My understanding was that we had the tools now to edit the
postgresql.conf programmatically? 

Seems like its not too convenient to change the way the server operates
to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
also not that happy about curtailing people's options on backup either:
if people decided they wanted to have a mixture of isolated on-line
backup (as you suggest), plus active archiving at other times they would
still have the problems you suggest.

Not sure what the edit commands are offhand, but we would need the
following program:

- edit postgresql.conf
- pg_reload_conf()
- wait 30
- pg_start_backup('blah')
- backup
- pg_stop_backup()
- unedit postgresql.conf
- pg_reload_conf()

Which could then be wrapped even more simply as

- pg_start_backup_online('blah')
- backup
- pg_stop_backup_online()

Which overall seems lots easier than changing the server and adding
another parameter.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: XLogArchivingActive

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>That's right, but my proposal would implicitely switch on archiving 
>>while backup is in progress, thus explicitely enabling/disabling 
>>archiving wouldn't be necessary.
> 
> 
> I'm not sure you can expect that to work.  The system is not built to
> guarantee instantaneous response to mode changes like that.

Um, as long as xlog writing stops immediate recycling when 
pg_start_backup is executed everything should be fine, since archived 
logs are not expected to be present until pg_stop_backup is done.

> 
> The conventional wisdom is that pg_dump files are substantially smaller
> than the on-disk footprint ... and that's even without compressing them.
> I think you are taking a corner case, ie bytea data, and presenting it
> as something that ought to be the design center.

I certainly have an extreme cornercase, since data is highly 
compressible. I won't suggest to replace pg_dump by physical backup 
methods, but disaster recovery may take considerably longer from a dump 
than from filesystem level backup.

> 
> Something that might be worth considering is an option to allow pg_dump
> to use binary COPY.  I don't think this'd work nicely for text dumps,
> but seems like custom- or tar-format dumps could be made to use it.
> This would probably be a win for many datatypes not only bytea, and it'd
> still be far more portable than a filesystem dump.

I'd really love a copy format that works for binary and text data as 
well, optimally compressed. Initial replication to a new slony cluster 
node uses COPY, and network bandwidth may become the restricting factor. 
Line protocol compression would be desirable for that too, but that's 
another story.


Regards,
Andreas


Re: XLogArchivingActive

From
Andreas Pflug
Date:
Simon Riggs wrote:
> On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:
> 

> 
>>Currently, I have to 
>>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
>>(hopefully) writable directory, do the backup, edit postgresql.conf and 
>>SIGHUP again. Not too convenient...
> 
> 
> You're doing this for pgAdmin right?

Not yet, just trying to manage a server.

> 
> My understanding was that we had the tools now to edit the
> postgresql.conf programmatically? 
> 
> Seems like its not too convenient to change the way the server operates
> to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
> also not that happy about curtailing people's options on backup either:
> if people decided they wanted to have a mixture of isolated on-line
> backup (as you suggest), plus active archiving at other times they would
> still have the problems you suggest.

Why?
My suggestion is to redefine XLogArchivingActive. Currently, it tests 
for non-null archive_command. I propose
bool XlogArchivingActive()
{   if (XLogArchiveCommand[0] == 0)      return false;   return (XLogPermanentArchive // from GUC      ||
OnlineBackupRunning());// from pg_start_backup
 
}

The people you mention simply have XLogPermanentActive=true in 
postgresql.conf, delivering the current behaviour.

> 
> Not sure what the edit commands are offhand, but we would need the
> following program:
> 
> - edit postgresql.conf
> - pg_reload_conf()
> - wait 30
> - pg_start_backup('blah')
> - backup
> - pg_stop_backup()
> - unedit postgresql.conf
> - pg_reload_conf()
> 
> Which could then be wrapped even more simply as
> 
> - pg_start_backup_online('blah')
> - backup
> - pg_stop_backup_online()

Editing postgresql.conf for this is ugly. In addition, 
pg_start_backup_online would need an additional parameter, the (highly 
machine specific) archive_command string. I'd like to see that parameter 
untouched in postgresql.conf.

Regards,
Andreas


Re: XLogArchivingActive

From
Simon Riggs
Date:
On Fri, 2006-05-26 at 12:38 +0200, Andreas Pflug wrote:

> Editing postgresql.conf for this is ugly. 

That seems to be the real issue here, not archiving. All you need to do
is to set and unset a parameter, that's all.

We're agreed that your end goal is worthwhile, but not on the mechanism
for achieving that. I don't think we should introduce complexity in one
part of the code just to get around a general issue.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: XLogArchivingActive

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Tom Lane wrote:
>> I'm not sure you can expect that to work.  The system is not built to
>> guarantee instantaneous response to mode changes like that.

> Um, as long as xlog writing stops immediate recycling when 
> pg_start_backup is executed everything should be fine, since archived 
> logs are not expected to be present until pg_stop_backup is done.

Wrong.  You forgot about all the *other* behaviors that change depending
on XLogArchivingActive, like whether CREATE INDEX gets archived or
just fsync'd.  I don't think it makes sense for CREATE INDEX to change
that behavior in midstream, even assuming that it noticed the flag
change instantly.
        regards, tom lane


Re: XLogArchivingActive

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>Tom Lane wrote:
>>
>>>I'm not sure you can expect that to work.  The system is not built to
>>>guarantee instantaneous response to mode changes like that.
> 
> 
>>Um, as long as xlog writing stops immediate recycling when 
>>pg_start_backup is executed everything should be fine, since archived 
>>logs are not expected to be present until pg_stop_backup is done.
> 
> 
> Wrong.  You forgot about all the *other* behaviors that change depending
> on XLogArchivingActive, like whether CREATE INDEX gets archived or
> just fsync'd.  I don't think it makes sense for CREATE INDEX to change
> that behavior in midstream, even assuming that it noticed the flag
> change instantly.

Ok, but how can I recognize whether all running commands have safely 
switched to "archiving mode" after enabling it, to continue backing up?

Thought a little about your proposal to use a non-copying 
archive_command, since I only want to have a backup of the state the 
cluster had when backup started, but this won't work because all write 
actions that are not appending (truncate, drop) would remove files 
needed for pre-backup state while possibly not backed up yet, thus the 
WAL archive is needed.
Following your proposal, I could redirect archiving to /dev/null while 
not backing up, but how can I make sure that WAL files of transactions, 
open when starting the backup procedure, are written to the wal 
directory, not lost previously? When pg_start_backup() is executed, I'd 
need the archiver to write all "hot" xlog files again.

Regards,
Andreas


Re: XLogArchivingActive

From
Bruce Momjian
Date:
Originally I wanted the command to be a string, and archiving to be a
boolean, but Tom wanted a single parameter, and others agreed.

---------------------------------------------------------------------------

Andreas Pflug wrote:
> Simon Riggs wrote:
> > On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:
> > 
> 
> > 
> >>Currently, I have to 
> >>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a 
> >>(hopefully) writable directory, do the backup, edit postgresql.conf and 
> >>SIGHUP again. Not too convenient...
> > 
> > 
> > You're doing this for pgAdmin right?
> 
> Not yet, just trying to manage a server.
> 
> > 
> > My understanding was that we had the tools now to edit the
> > postgresql.conf programmatically? 
> > 
> > Seems like its not too convenient to change the way the server operates
> > to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
> > also not that happy about curtailing people's options on backup either:
> > if people decided they wanted to have a mixture of isolated on-line
> > backup (as you suggest), plus active archiving at other times they would
> > still have the problems you suggest.
> 
> Why?
> My suggestion is to redefine XLogArchivingActive. Currently, it tests 
> for non-null archive_command. I propose
> bool XlogArchivingActive()
> {
>     if (XLogArchiveCommand[0] == 0)
>        return false;
>     return (XLogPermanentArchive // from GUC
>        || OnlineBackupRunning()); // from pg_start_backup
> }
> 
> The people you mention simply have XLogPermanentActive=true in 
> postgresql.conf, delivering the current behaviour.
> 
> > 
> > Not sure what the edit commands are offhand, but we would need the
> > following program:
> > 
> > - edit postgresql.conf
> > - pg_reload_conf()
> > - wait 30
> > - pg_start_backup('blah')
> > - backup
> > - pg_stop_backup()
> > - unedit postgresql.conf
> > - pg_reload_conf()
> > 
> > Which could then be wrapped even more simply as
> > 
> > - pg_start_backup_online('blah')
> > - backup
> > - pg_stop_backup_online()
> 
> Editing postgresql.conf for this is ugly. In addition, 
> pg_start_backup_online would need an additional parameter, the (highly 
> machine specific) archive_command string. I'd like to see that parameter 
> untouched in postgresql.conf.
> 
> Regards,
> Andreas
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: XLogArchivingActive

From
"Jim C. Nasby"
Date:
On Fri, May 26, 2006 at 12:15:34AM +0200, Andreas Pflug wrote:
> Jim Nasby wrote:
> >Another consideration is that you can use rsync to update a 
> >filesystem-level backup, but there's no pg_dump equivalent. On a large 
> >database that can make a sizable difference in the amount of time 
> >required for a backup.
> That's fine to cut the backup execution time, but to guarantee 
> consistency while the cluster is running pg_start_backup/pg_stop_backup 
> and WAL archiving will still be necessary.

Of course, but the point is that it would only be necessary while you're
running rsync. If you don't care about being able to actually roll
forward from that backup, you don't need any WAL files from after rsync
completes.

One possible way to accomplish this would be to allow specifying an
archiver command to pg_start_backup, which would then fire up an
archiver for the duration of your backup. Then you can:

SELECT pg_start_backup('label', 'cp -i %p /mnt/server/archivedir/%f
</dev/null');
rsync
SELECT pg_stop_backup();

No messing with postgresql.conf, no need to HUP the postmaster.

Perhaps the OP would be interested in coding this up, or sponsoring
someone to do so, since I think it provide what they were looking for.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: XLogArchivingActive

From
"Jim C. Nasby"
Date:
On Fri, May 26, 2006 at 10:59:37AM +0100, Simon Riggs wrote:
> Not sure what the edit commands are offhand, but we would need the
> following program:
> 
> - edit postgresql.conf
> - pg_reload_conf()
> - wait 30
> - pg_start_backup('blah')

Rather than 'wait 30', ISTM it would be better to just leave archiving
enabled, but not actually archiving WAL files.

Or, setup some mechanism so that you can tell if any commands who's
behavior would change based on archiving are running, and if any of
those that are running think archiving is disabled,
pg_start_backup_online blocks on them.

Also, regarding needing to place an archiver command in
pg_start_backup_online, another option would be to depend on the
filesystem backup to copy the WAL files, and just let them pile up in
pg_xlog until pg_stop_backup_online. Of course, that would require a
two-step filesystem copy, since you'd need to first copy everything in
$PGDATA, and then copy $PGDATA/pg_xlog after you have that.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: XLogArchivingActive

From
Andreas Pflug
Date:
Jim C. Nasby wrote:

> 
> Also, regarding needing to place an archiver command in
> pg_start_backup_online, another option would be to depend on the
> filesystem backup to copy the WAL files, and just let them pile up in
> pg_xlog until pg_stop_backup_online. Of course, that would require a
> two-step filesystem copy, since you'd need to first copy everything in
> $PGDATA, and then copy $PGDATA/pg_xlog after you have that.

Sounds fine. This solves the problem to insure that all required wal 
files are actually copied to the wal archive.

Regards,
Andreas