Thread: Incremental Backup Script
Hello, as far as I have understood, the WAL backup that you control via "archive_command" is the PostgreSQL equivalent to what otherdatabases let you do with an incremental backup. That is, if you don't forget to include the current WAL block. I have found a script to determine the current WAL on the admin mailing list. Based on this script, I intend to write two scripts that do the following (unless something like this already exists). basebackup -> basebackup.<bbd>.bz2 incrementalbackup -> incrementalbackup.<bbd>.<ibc>.<ibd>.bz2 (incremental backup relative to the last incremental backup) restore (a file produced by the above commands) -> restore database (either base, or base + 1..n incremental backups) <bbd>: base backup date (e.g. 2005-12-25-14-00) <ibc>: incremental backup counter (1..n) <ibd>: incremental backup date The central idea is that base backups are guaranteed to include all information up to <bbd> and incremental backups all dataup to <ibd>. I hope that this makes it easier for administrators. archive_command: copy the files to a local backup directory (LWB = local wal backup) basebackup: 1) tar the data directory 2) add any WALs that are produced while the backup is running. 3) delete all WAL that are included in the tar 4) I still wonder how <bbd> must be chosen (that of pg_start_backup?) incremental backup: 1) add all WAL that are currently in the LWB to the tar 2) add the current WAL to the tar 3) verify that all WALs prior to the current WAL are included (i.e. that no WAL is currently being copied to the LWB) 4) delete all WAL that are included in the tar restore: 1) if it's a base backup, just restore that 2) if it's an incremental backup, check that the corresponding base backup and all incremental backups with lower <ibc> areavailable. Then restore the base backup and all incremental backups up to the one specified Also, I was wondering whether it is always safe to copy the current WAL file, i.e. may the current WAL file be invalid inany circumstance? Is this a sensible idea? Regards, Gregor
""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote > > Also, I was wondering whether it is always safe to copy the current WAL > file, i.e. may the current WAL file be invalid in any circumstance? > If you mean "current WAL file" is the xlog segment in use, then it is dangerous. We only backup the xlog segments that have been fully used up. Regards, Qingqing
On Sun, 2005-12-25 at 14:02 +0100, Gregor Zeitlinger wrote: > as far as I have understood, the WAL backup that you control via > "archive_command" is the PostgreSQL equivalent to what other databases > let you do with an incremental backup No it is not an incremental backup of changed data blocks, it is a transactional log archival. So, other parts of your thinking are slightly off - but not by much. The only way to do a partial recovery is to follow the PITR notes. Best Regards, Simon Riggs
On 12/26/05 11:04, Qingqing Zhou wrote: > ""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote > >>Also, I was wondering whether it is always safe to copy the current WAL >>file, i.e. may the current WAL file be invalid in any circumstance? >> > > If you mean "current WAL file" is the xlog segment in use, then it is > dangerous. We only backup the xlog segments that have been fully used up. As per docs, if the databases are rarely updated it could take a long time for the WAL segment to "roll over". We need to backup the current segment to guarantee we have the latest trasactions archived at time of failure. http://www.postgresql.org/docs/8.1/interactive/backup-online.html "If you are concerned about being able to recover right up to the current instant, you may want to take additional steps to ensure that the current, partially-filled WAL segment is also copied someplace. This is particularly important if your server generates only little WAL traffic (or has slack periods where it does so), since it could take a long time before a WAL segment file is completely filled and ready to archive. One possible way to handle this is to set up a cron job that periodically (once a minute, perhaps) identifies the current WAL segment file and saves it someplace safe." Gregor: can you explain how to identify the current file? I had implemented a backup and restore script for PITR but stumbled at this point. The page above does not specify how this is to be done. I appreciate the addition of PITR - it's better than nothing (nothing being full dumps) in some respects. Ideally, we need to be able to dump deltas for a single database. In practice, restoration using the PITR method is awkward. I guess you would tarball the current data files, do a full restore, do a full dump of the database you are interested in, ditch the restored data files and replace them with the ones you tarballed, then do a database load from the full dump. The only way to avoid having the other databases on the server offline is to restore to a second postgresql instance. Not complaining, just saying :-) > Regards, > Qingqing Zach.
I would certainly like some instructions on this as well. On Jan 3, 2006, at 8:41 PM, Zach Bagnall wrote: > On 12/26/05 11:04, Qingqing Zhou wrote: >> ""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote >>> Also, I was wondering whether it is always safe to copy the >>> current WAL file, i.e. may the current WAL file be invalid in any >>> circumstance? >>> >> If you mean "current WAL file" is the xlog segment in use, then it >> is dangerous. We only backup the xlog segments that have been >> fully used up. > > As per docs, if the databases are rarely updated it could take a > long time for the WAL segment to "roll over". We need to backup the > current segment to guarantee we have the latest trasactions > archived at time of failure. > > http://www.postgresql.org/docs/8.1/interactive/backup-online.html > "If you are concerned about being able to recover right up to the > current instant, you may want to take additional steps to ensure > that the current, partially-filled WAL segment is also copied > someplace. This is particularly important if your server generates > only little WAL traffic (or has slack periods where it does so), > since it could take a long time before a WAL segment file is > completely filled and ready to archive. One possible way to handle > this is to set up a cron job that periodically (once a minute, > perhaps) identifies the current WAL segment file and saves it > someplace safe." > > Gregor: can you explain how to identify the current file? I had > implemented a backup and restore script for PITR but stumbled at > this point. The page above does not specify how this is to be done. > > I appreciate the addition of PITR - it's better than nothing > (nothing being full dumps) in some respects. Ideally, we need to be > able to dump deltas for a single database. In practice, restoration > using the PITR method is awkward. I guess you would tarball the > current data files, do a full restore, do a full dump of the > database you are interested in, ditch the restored data files and > replace them with the ones you tarballed, then do a database load > from the full dump. The only way to avoid having the other > databases on the server offline is to restore to a second > postgresql instance. Not complaining, just saying :-) > > > >> Regards, >> Qingqing > > Zach. > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
-----Original Message----- From: Zach Bagnall [mailto:zach.bagnall@bulletinwireless.com] Sent: Wednesday, January 04, 2006 4:42 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Incremental Backup Script >Gregor: can you explain how to identify the current file? I had >implemented a backup and restore script for PITR but stumbled at this >point. The page above does not specify how this is to be done. I have found a way from http://archives.postgresql.org/pgsql-admin/2005-10/msg00059.php. I have not tried it, but it seemsstraightforward. >I appreciate the addition of PITR - it's better than nothing (nothing >being full dumps) in some respects. Ideally, we need to be able to dump >deltas for a single database. Yes, it is not a replacement for an incremental backup, especially due to the fact that WALs may be orders of magnitude largerthan a delta. >In practice, restoration using the PITR >method is awkward. Yes, what I am planning to do: 1) drop the corrupted database 2) restore the base backup 3) replay all incremental backups (in the sense of my original mail) Gregor Zeitlinger LUCAS Product Development Torex Retail Solutions GmbH Schwedenstr. 9, D-13359 Berlin Tel. +49 (0) 30 49901-243 Fax +49 (0) 30 49901-139 Mailto:gregor.zeitlinger@torexretail.de http://www.torexretail.de
> As per docs, if the databases are rarely updated it could take a long > time for the WAL segment to "roll over". Yes, therefore I want to copy the current WAL (as I said earlier). When restoring, I also want to make sure that I restore exactely to the point when I copied the current WA segment. Hence I consider to do it as follows: 1) take the t = current time 2) copy the current WAL 3) when restoring, set recovery_target_time = t Maybe there is even a way to ask Postgres of its last commited x = xid. In that case, we could set recovery_target_xid = x Is that possible? Regards, Gregor