Thread: Incremental Backup Script

Incremental Backup Script

From
"Gregor Zeitlinger"
Date:
Hello,
as far as I have understood, the WAL backup that you control via "archive_command" is the PostgreSQL equivalent to what
otherdatabases let you do with an incremental backup. That is, if you don't forget to include the current WAL block.
 
I have found a script to determine the current WAL on the admin mailing list. 
Based on this script, I intend to write two scripts that do the following (unless something like this already exists).
basebackup -> basebackup.<bbd>.bz2
incrementalbackup -> incrementalbackup.<bbd>.<ibc>.<ibd>.bz2 (incremental backup relative to the last incremental
backup)
restore (a file produced by the above commands) -> restore database (either base, or base + 1..n incremental backups)
<bbd>: base backup date (e.g. 2005-12-25-14-00)
<ibc>: incremental backup counter (1..n)
<ibd>: incremental backup date
The central idea is that base backups are guaranteed to include all information up to <bbd> and incremental backups all
dataup to <ibd>. I hope that this makes it easier for administrators.
 
archive_command:
copy the files to a local backup directory (LWB = local wal backup)
basebackup:
1) tar the data directory 
2) add any WALs that are produced while the backup is running. 
3) delete all WAL that are included in the tar
4) I still wonder how <bbd> must be chosen (that of pg_start_backup?)
incremental backup: 
1) add all WAL that are currently in the LWB to the tar
2) add the current WAL to the tar
3) verify that all WALs prior to the current WAL are included (i.e. that no WAL is currently being copied to the LWB)
4) delete all WAL that are included in the tar
restore:
1) if it's a base backup, just restore that
2) if it's an incremental backup, check that the corresponding base backup and all incremental backups with lower <ibc>
areavailable. Then restore the base backup and all incremental backups up to the one specified
 
Also, I was wondering whether it is always safe to copy the current WAL file, i.e. may the current WAL file be invalid
inany circumstance?
 
Is this a sensible idea?
Regards,
Gregor

Re: Incremental Backup Script

From
"Qingqing Zhou"
Date:
""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote
>
> Also, I was wondering whether it is always safe to copy the current WAL 
> file, i.e. may the current WAL file be invalid in any circumstance?
>

If you mean "current WAL file" is the xlog segment in use, then it is 
dangerous. We only backup the xlog segments that have been fully used up.

Regards,
Qingqing 




Re: Incremental Backup Script

From
Simon Riggs
Date:
On Sun, 2005-12-25 at 14:02 +0100, Gregor Zeitlinger wrote:
> as far as I have understood, the WAL backup that you control via
> "archive_command" is the PostgreSQL equivalent to what other databases
> let you do with an incremental backup

No it is not an incremental backup of changed data blocks, it is a
transactional log archival. So, other parts of your thinking are
slightly off - but not by much. The only way to do a partial recovery is
to follow the PITR notes.

Best Regards, Simon Riggs





Re: Incremental Backup Script

From
Zach Bagnall
Date:
On 12/26/05 11:04, Qingqing Zhou wrote:
> ""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote
> 
>>Also, I was wondering whether it is always safe to copy the current WAL 
>>file, i.e. may the current WAL file be invalid in any circumstance?
>>
> 
> If you mean "current WAL file" is the xlog segment in use, then it is 
> dangerous. We only backup the xlog segments that have been fully used up.

As per docs, if the databases are rarely updated it could take a long 
time for the WAL segment to "roll over". We need to backup the current 
segment to guarantee we have the latest trasactions archived at time of 
failure.

http://www.postgresql.org/docs/8.1/interactive/backup-online.html
"If you are concerned about being able to recover right up to the 
current instant, you may want to take additional steps to ensure that 
the current, partially-filled WAL segment is also copied someplace. This 
is particularly important if your server generates only little WAL 
traffic (or has slack periods where it does so), since it could take a 
long time before a WAL segment file is completely filled and ready to 
archive. One possible way to handle this is to set up a cron job that 
periodically (once a minute, perhaps) identifies the current WAL segment 
file and saves it someplace safe."

Gregor: can you explain how to identify the current file? I had 
implemented a backup and restore script for PITR but stumbled at this 
point. The page above does not specify how this is to be done.

I appreciate the addition of PITR - it's better than nothing (nothing 
being full dumps) in some respects. Ideally, we need to be able to dump 
deltas for a single database. In practice, restoration using the PITR 
method is awkward. I guess you would tarball the current data files, do 
a full restore, do a full dump of the database you are interested in, 
ditch the restored data files and replace them with the ones you 
tarballed, then do a database load from the full dump. The only way to 
avoid having the other databases on the server offline is to restore to 
a second postgresql instance. Not complaining, just saying :-)



> Regards,
> Qingqing 

Zach.


Re: Incremental Backup Script

From
Rick Gigger
Date:
I would certainly like some instructions on this as well.

On Jan 3, 2006, at 8:41 PM, Zach Bagnall wrote:

> On 12/26/05 11:04, Qingqing Zhou wrote:
>> ""Gregor Zeitlinger"" <gregor.zeitlinger@torexretail.de> wrote
>>> Also, I was wondering whether it is always safe to copy the  
>>> current WAL file, i.e. may the current WAL file be invalid in any  
>>> circumstance?
>>>
>> If you mean "current WAL file" is the xlog segment in use, then it  
>> is dangerous. We only backup the xlog segments that have been  
>> fully used up.
>
> As per docs, if the databases are rarely updated it could take a  
> long time for the WAL segment to "roll over". We need to backup the  
> current segment to guarantee we have the latest trasactions  
> archived at time of failure.
>
> http://www.postgresql.org/docs/8.1/interactive/backup-online.html
> "If you are concerned about being able to recover right up to the  
> current instant, you may want to take additional steps to ensure  
> that the current, partially-filled WAL segment is also copied  
> someplace. This is particularly important if your server generates  
> only little WAL traffic (or has slack periods where it does so),  
> since it could take a long time before a WAL segment file is  
> completely filled and ready to archive. One possible way to handle  
> this is to set up a cron job that periodically (once a minute,  
> perhaps) identifies the current WAL segment file and saves it  
> someplace safe."
>
> Gregor: can you explain how to identify the current file? I had  
> implemented a backup and restore script for PITR but stumbled at  
> this point. The page above does not specify how this is to be done.
>
> I appreciate the addition of PITR - it's better than nothing  
> (nothing being full dumps) in some respects. Ideally, we need to be  
> able to dump deltas for a single database. In practice, restoration  
> using the PITR method is awkward. I guess you would tarball the  
> current data files, do a full restore, do a full dump of the  
> database you are interested in, ditch the restored data files and  
> replace them with the ones you tarballed, then do a database load  
> from the full dump. The only way to avoid having the other  
> databases on the server offline is to restore to a second  
> postgresql instance. Not complaining, just saying :-)
>
>
>
>> Regards,
>> Qingqing
>
> Zach.
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>



Re: Incremental Backup Script

From
"Gregor Zeitlinger"
Date:
-----Original Message-----
From: Zach Bagnall [mailto:zach.bagnall@bulletinwireless.com]
Sent: Wednesday, January 04, 2006 4:42 AM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Incremental Backup Script


>Gregor: can you explain how to identify the current file? I had
>implemented a backup and restore script for PITR but stumbled at this
>point. The page above does not specify how this is to be done.
I have found a way from http://archives.postgresql.org/pgsql-admin/2005-10/msg00059.php. I have not tried it, but it
seemsstraightforward. 

>I appreciate the addition of PITR - it's better than nothing (nothing
>being full dumps) in some respects. Ideally, we need to be able to dump
>deltas for a single database.
Yes, it is not a replacement for an incremental backup, especially due to the fact that WALs may be orders of magnitude
largerthan a delta. 

>In practice, restoration using the PITR
>method is awkward.
Yes, what I am planning to do:
1) drop the corrupted database
2) restore the base backup
3) replay all incremental backups (in the sense of my original mail)

Gregor Zeitlinger
LUCAS Product Development

Torex Retail Solutions GmbH

Schwedenstr. 9, D-13359 Berlin
Tel. +49 (0) 30 49901-243
Fax +49 (0) 30 49901-139
Mailto:gregor.zeitlinger@torexretail.de
http://www.torexretail.de


Re: Incremental Backup Script

From
"Gregor Zeitlinger"
Date:
> As per docs, if the databases are rarely updated it could take a long
> time for the WAL segment to "roll over". 
Yes, therefore I want to copy the current WAL (as I said earlier).
When restoring, I also want to make sure that I restore exactely to the point when I copied the current WA segment.

Hence I consider to do it as follows:
1) take the t = current time
2) copy the current WAL
3) when restoring, set  recovery_target_time  = t
Maybe there is even a way to ask Postgres of its last commited x = xid.
In that case, we could set recovery_target_xid = x
Is that possible?
Regards,
Gregor