Thread: XLogArchivingActive
Currently, WAL files will be archived as soon as archive_command is set. IMHO, this is not desirable if no permanent backupis wanted, but only scheduled online backup because; it will flood the wal_archive destination with files that will never be used. I propose to introduce a GUC "permanent_archiving" or so, to select whether wal archiving happens permanently or only when a backup is in progress (i.e. between pg_start_backup and pg_stop_backup). Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > I propose to introduce a GUC "permanent_archiving" or so, to select > whether wal archiving happens permanently or only when a backup is in > progress (i.e. between pg_start_backup and pg_stop_backup). This is silly. Why not just turn archiving on and off? regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>I propose to introduce a GUC "permanent_archiving" or so, to select >>whether wal archiving happens permanently or only when a backup is in >>progress (i.e. between pg_start_backup and pg_stop_backup). > > > This is silly. Why not just turn archiving on and off? Not quite. I want online backup, but no archiving. Currently, I have to edit postgresql.conf and SIGHUP to "turn on archiving" configuring a (hopefully) writable directory, do the backup, edit postgresql.conf and SIGHUP again. Not too convenient... Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> This is silly. Why not just turn archiving on and off? > Not quite. I want online backup, but no archiving. Currently, I have to > edit postgresql.conf and SIGHUP to "turn on archiving" configuring a > (hopefully) writable directory, do the backup, edit postgresql.conf and > SIGHUP again. Not too convenient... You don't get to count the edit/SIGHUP steps, because those would be the same for any other GUC. AFAICS you could get the effect by setting up an archive_command scriptsleep 100exit 1 so that the archiver will do nothing. BTW, I don't actually understand why you want this at all. If you're not going to keep a continuing series of WAL files, you don't have any PITR capability. What you're proposing seems like a bulky, unportable, hard-to-use equivalent of pg_dump. Why not use pg_dump? regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Tom Lane wrote: >> >>>This is silly. Why not just turn archiving on and off? > > >>Not quite. I want online backup, but no archiving. Currently, I have to >>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a >>(hopefully) writable directory, do the backup, edit postgresql.conf and >>SIGHUP again. Not too convenient... > > > You don't get to count the edit/SIGHUP steps, because those would be the > same for any other GUC. That's right, but my proposal would implicitely switch on archiving while backup is in progress, thus explicitely enabling/disabling archiving wouldn't be necessary. > > AFAICS you could get the effect by setting up an archive_command script > sleep 100 > exit 1 > so that the archiver will do nothing. Doesn't WAL expect the WAL files already archived to be recyclable, so they could get overwritten in the pg_xlog dir while backup is running? Additionally, the doc recommends omitting pg_xlog from the file level backup, so a restart would need the archived wal files, no? > > BTW, I don't actually understand why you want this at all. If you're > not going to keep a continuing series of WAL files, you don't have any > PITR capability. What you're proposing seems like a bulky, unportable, > hard-to-use equivalent of pg_dump. Why not use pg_dump? Because pg_dump will take too long and create bloated dump files. All I need is a physical backup for disaster recovery purposes without bringing down the server. In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB when pg_dumped, too much for the available backup capacity (esp. compared to net content, about 290GB). See other post "inefficient bytea escaping" for details. Regards, Andreas
On May 25, 2006, at 11:24 AM, Andreas Pflug wrote: >> BTW, I don't actually understand why you want this at all. If you're >> not going to keep a continuing series of WAL files, you don't have >> any >> PITR capability. What you're proposing seems like a bulky, >> unportable, >> hard-to-use equivalent of pg_dump. Why not use pg_dump? > > Because pg_dump will take too long and create bloated dump files. > All I need is a physical backup for disaster recovery purposes > without bringing down the server. > > In my case, I'd expect a DB that uses 114GB on disk to consume > 1.4TB when pg_dumped, too much for the available backup capacity > (esp. compared to net content, about 290GB). See other post > "inefficient bytea escaping" for details. Another consideration is that you can use rsync to update a filesystem-level backup, but there's no pg_dump equivalent. On a large database that can make a sizable difference in the amount of time required for a backup. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim Nasby wrote: > On May 25, 2006, at 11:24 AM, Andreas Pflug wrote: >>> BTW, I don't actually understand why you want this at all. If you're >>> not going to keep a continuing series of WAL files, you don't have any >>> PITR capability. What you're proposing seems like a bulky, unportable, >>> hard-to-use equivalent of pg_dump. Why not use pg_dump? >> >> Because pg_dump will take too long and create bloated dump files. All >> I need is a physical backup for disaster recovery purposes without >> bringing down the server. >> >> In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB >> when pg_dumped, too much for the available backup capacity (esp. >> compared to net content, about 290GB). See other post "inefficient >> bytea escaping" for details. > > Another consideration is that you can use rsync to update a > filesystem-level backup, but there's no pg_dump equivalent. On a large > database that can make a sizable difference in the amount of time > required for a backup. That's fine to cut the backup execution time, but to guarantee consistency while the cluster is running pg_start_backup/pg_stop_backup and WAL archiving will still be necessary. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > That's right, but my proposal would implicitely switch on archiving > while backup is in progress, thus explicitely enabling/disabling > archiving wouldn't be necessary. I'm not sure you can expect that to work. The system is not built to guarantee instantaneous response to mode changes like that. >> BTW, I don't actually understand why you want this at all. If you're >> not going to keep a continuing series of WAL files, you don't have any >> PITR capability. What you're proposing seems like a bulky, unportable, >> hard-to-use equivalent of pg_dump. Why not use pg_dump? > Because pg_dump will take too long and create bloated dump files. All I > need is a physical backup for disaster recovery purposes without > bringing down the server. > In my case, I'd expect a DB that uses 114GB on disk to consume 1.4TB > when pg_dumped, too much for the available backup capacity (esp. > compared to net content, about 290GB). See other post "inefficient bytea > escaping" for details. The conventional wisdom is that pg_dump files are substantially smaller than the on-disk footprint ... and that's even without compressing them. I think you are taking a corner case, ie bytea data, and presenting it as something that ought to be the design center. Something that might be worth considering is an option to allow pg_dump to use binary COPY. I don't think this'd work nicely for text dumps, but seems like custom- or tar-format dumps could be made to use it. This would probably be a win for many datatypes not only bytea, and it'd still be far more portable than a filesystem dump. regards, tom lane
On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: > Tom Lane wrote: > > Andreas Pflug <pgadmin@pse-consulting.de> writes: > > > >>I propose to introduce a GUC "permanent_archiving" or so, to select > >>whether wal archiving happens permanently or only when a backup is in > >>progress (i.e. between pg_start_backup and pg_stop_backup). > > > > > > This is silly. Why not just turn archiving on and off? > > Not quite. I want online backup, but no archiving. I can see what you want and why you want it. It's good to have the option of a physical online backup as opposed to the logical online backup that pg_dump offers. > Currently, I have to > edit postgresql.conf and SIGHUP to "turn on archiving" configuring a > (hopefully) writable directory, do the backup, edit postgresql.conf and > SIGHUP again. Not too convenient... You're doing this for pgAdmin right? My understanding was that we had the tools now to edit the postgresql.conf programmatically? Seems like its not too convenient to change the way the server operates to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm also not that happy about curtailing people's options on backup either: if people decided they wanted to have a mixture of isolated on-line backup (as you suggest), plus active archiving at other times they would still have the problems you suggest. Not sure what the edit commands are offhand, but we would need the following program: - edit postgresql.conf - pg_reload_conf() - wait 30 - pg_start_backup('blah') - backup - pg_stop_backup() - unedit postgresql.conf - pg_reload_conf() Which could then be wrapped even more simply as - pg_start_backup_online('blah') - backup - pg_stop_backup_online() Which overall seems lots easier than changing the server and adding another parameter. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>That's right, but my proposal would implicitely switch on archiving >>while backup is in progress, thus explicitely enabling/disabling >>archiving wouldn't be necessary. > > > I'm not sure you can expect that to work. The system is not built to > guarantee instantaneous response to mode changes like that. Um, as long as xlog writing stops immediate recycling when pg_start_backup is executed everything should be fine, since archived logs are not expected to be present until pg_stop_backup is done. > > The conventional wisdom is that pg_dump files are substantially smaller > than the on-disk footprint ... and that's even without compressing them. > I think you are taking a corner case, ie bytea data, and presenting it > as something that ought to be the design center. I certainly have an extreme cornercase, since data is highly compressible. I won't suggest to replace pg_dump by physical backup methods, but disaster recovery may take considerably longer from a dump than from filesystem level backup. > > Something that might be worth considering is an option to allow pg_dump > to use binary COPY. I don't think this'd work nicely for text dumps, > but seems like custom- or tar-format dumps could be made to use it. > This would probably be a win for many datatypes not only bytea, and it'd > still be far more portable than a filesystem dump. I'd really love a copy format that works for binary and text data as well, optimally compressed. Initial replication to a new slony cluster node uses COPY, and network bandwidth may become the restricting factor. Line protocol compression would be desirable for that too, but that's another story. Regards, Andreas
Simon Riggs wrote: > On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: > > >>Currently, I have to >>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a >>(hopefully) writable directory, do the backup, edit postgresql.conf and >>SIGHUP again. Not too convenient... > > > You're doing this for pgAdmin right? Not yet, just trying to manage a server. > > My understanding was that we had the tools now to edit the > postgresql.conf programmatically? > > Seems like its not too convenient to change the way the server operates > to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm > also not that happy about curtailing people's options on backup either: > if people decided they wanted to have a mixture of isolated on-line > backup (as you suggest), plus active archiving at other times they would > still have the problems you suggest. Why? My suggestion is to redefine XLogArchivingActive. Currently, it tests for non-null archive_command. I propose bool XlogArchivingActive() { if (XLogArchiveCommand[0] == 0) return false; return (XLogPermanentArchive // from GUC || OnlineBackupRunning());// from pg_start_backup } The people you mention simply have XLogPermanentActive=true in postgresql.conf, delivering the current behaviour. > > Not sure what the edit commands are offhand, but we would need the > following program: > > - edit postgresql.conf > - pg_reload_conf() > - wait 30 > - pg_start_backup('blah') > - backup > - pg_stop_backup() > - unedit postgresql.conf > - pg_reload_conf() > > Which could then be wrapped even more simply as > > - pg_start_backup_online('blah') > - backup > - pg_stop_backup_online() Editing postgresql.conf for this is ugly. In addition, pg_start_backup_online would need an additional parameter, the (highly machine specific) archive_command string. I'd like to see that parameter untouched in postgresql.conf. Regards, Andreas
On Fri, 2006-05-26 at 12:38 +0200, Andreas Pflug wrote: > Editing postgresql.conf for this is ugly. That seems to be the real issue here, not archiving. All you need to do is to set and unset a parameter, that's all. We're agreed that your end goal is worthwhile, but not on the mechanism for achieving that. I don't think we should introduce complexity in one part of the code just to get around a general issue. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> I'm not sure you can expect that to work. The system is not built to >> guarantee instantaneous response to mode changes like that. > Um, as long as xlog writing stops immediate recycling when > pg_start_backup is executed everything should be fine, since archived > logs are not expected to be present until pg_stop_backup is done. Wrong. You forgot about all the *other* behaviors that change depending on XLogArchivingActive, like whether CREATE INDEX gets archived or just fsync'd. I don't think it makes sense for CREATE INDEX to change that behavior in midstream, even assuming that it noticed the flag change instantly. regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Tom Lane wrote: >> >>>I'm not sure you can expect that to work. The system is not built to >>>guarantee instantaneous response to mode changes like that. > > >>Um, as long as xlog writing stops immediate recycling when >>pg_start_backup is executed everything should be fine, since archived >>logs are not expected to be present until pg_stop_backup is done. > > > Wrong. You forgot about all the *other* behaviors that change depending > on XLogArchivingActive, like whether CREATE INDEX gets archived or > just fsync'd. I don't think it makes sense for CREATE INDEX to change > that behavior in midstream, even assuming that it noticed the flag > change instantly. Ok, but how can I recognize whether all running commands have safely switched to "archiving mode" after enabling it, to continue backing up? Thought a little about your proposal to use a non-copying archive_command, since I only want to have a backup of the state the cluster had when backup started, but this won't work because all write actions that are not appending (truncate, drop) would remove files needed for pre-backup state while possibly not backed up yet, thus the WAL archive is needed. Following your proposal, I could redirect archiving to /dev/null while not backing up, but how can I make sure that WAL files of transactions, open when starting the backup procedure, are written to the wal directory, not lost previously? When pg_start_backup() is executed, I'd need the archiver to write all "hot" xlog files again. Regards, Andreas
Originally I wanted the command to be a string, and archiving to be a boolean, but Tom wanted a single parameter, and others agreed. --------------------------------------------------------------------------- Andreas Pflug wrote: > Simon Riggs wrote: > > On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: > > > > > > >>Currently, I have to > >>edit postgresql.conf and SIGHUP to "turn on archiving" configuring a > >>(hopefully) writable directory, do the backup, edit postgresql.conf and > >>SIGHUP again. Not too convenient... > > > > > > You're doing this for pgAdmin right? > > Not yet, just trying to manage a server. > > > > > My understanding was that we had the tools now to edit the > > postgresql.conf programmatically? > > > > Seems like its not too convenient to change the way the server operates > > to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm > > also not that happy about curtailing people's options on backup either: > > if people decided they wanted to have a mixture of isolated on-line > > backup (as you suggest), plus active archiving at other times they would > > still have the problems you suggest. > > Why? > My suggestion is to redefine XLogArchivingActive. Currently, it tests > for non-null archive_command. I propose > bool XlogArchivingActive() > { > if (XLogArchiveCommand[0] == 0) > return false; > return (XLogPermanentArchive // from GUC > || OnlineBackupRunning()); // from pg_start_backup > } > > The people you mention simply have XLogPermanentActive=true in > postgresql.conf, delivering the current behaviour. > > > > > Not sure what the edit commands are offhand, but we would need the > > following program: > > > > - edit postgresql.conf > > - pg_reload_conf() > > - wait 30 > > - pg_start_backup('blah') > > - backup > > - pg_stop_backup() > > - unedit postgresql.conf > > - pg_reload_conf() > > > > Which could then be wrapped even more simply as > > > > - pg_start_backup_online('blah') > > - backup > > - pg_stop_backup_online() > > Editing postgresql.conf for this is ugly. In addition, > pg_start_backup_online would need an additional parameter, the (highly > machine specific) archive_command string. I'd like to see that parameter > untouched in postgresql.conf. > > Regards, > Andreas > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, May 26, 2006 at 12:15:34AM +0200, Andreas Pflug wrote: > Jim Nasby wrote: > >Another consideration is that you can use rsync to update a > >filesystem-level backup, but there's no pg_dump equivalent. On a large > >database that can make a sizable difference in the amount of time > >required for a backup. > That's fine to cut the backup execution time, but to guarantee > consistency while the cluster is running pg_start_backup/pg_stop_backup > and WAL archiving will still be necessary. Of course, but the point is that it would only be necessary while you're running rsync. If you don't care about being able to actually roll forward from that backup, you don't need any WAL files from after rsync completes. One possible way to accomplish this would be to allow specifying an archiver command to pg_start_backup, which would then fire up an archiver for the duration of your backup. Then you can: SELECT pg_start_backup('label', 'cp -i %p /mnt/server/archivedir/%f </dev/null'); rsync SELECT pg_stop_backup(); No messing with postgresql.conf, no need to HUP the postmaster. Perhaps the OP would be interested in coding this up, or sponsoring someone to do so, since I think it provide what they were looking for. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, May 26, 2006 at 10:59:37AM +0100, Simon Riggs wrote: > Not sure what the edit commands are offhand, but we would need the > following program: > > - edit postgresql.conf > - pg_reload_conf() > - wait 30 > - pg_start_backup('blah') Rather than 'wait 30', ISTM it would be better to just leave archiving enabled, but not actually archiving WAL files. Or, setup some mechanism so that you can tell if any commands who's behavior would change based on archiving are running, and if any of those that are running think archiving is disabled, pg_start_backup_online blocks on them. Also, regarding needing to place an archiver command in pg_start_backup_online, another option would be to depend on the filesystem backup to copy the WAL files, and just let them pile up in pg_xlog until pg_stop_backup_online. Of course, that would require a two-step filesystem copy, since you'd need to first copy everything in $PGDATA, and then copy $PGDATA/pg_xlog after you have that. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > > Also, regarding needing to place an archiver command in > pg_start_backup_online, another option would be to depend on the > filesystem backup to copy the WAL files, and just let them pile up in > pg_xlog until pg_stop_backup_online. Of course, that would require a > two-step filesystem copy, since you'd need to first copy everything in > $PGDATA, and then copy $PGDATA/pg_xlog after you have that. Sounds fine. This solves the problem to insure that all required wal files are actually copied to the wal archive. Regards, Andreas