Thread: incremental backup of postgres database?

incremental backup of postgres database?

From
Palle Girgensohn
Date:
Hi!

What would be the best suggestion for incremental backup of a rather large
database, where the bulk data volume consists of large objects. Since
backup will be transmitted over a 2 Mbit/s internet line, we need to
minimize the data flow for each nightly backup. The compressed database
dump file, when dumped with pg_dump -F c -b, is roughly 2.5 GB, whereas a
dump without large objects is roughly is only 2% that size. I can live with
having to transfer the BLOB-less dump every night, but not several
gigabytes of data...

So, I will either need to find  a way to get the latest data (I have
timestamps for all LOBS) and somehow get it to a file in a restorable
format... One simple way would be to select all new blobs into a temp file
and copy that table to a backup file

or

replicate the database in real time to the backup site, using one of the
replication projects? How robust are the replication systems today? What
will happen if the 2Mb/s line fails temporarily?

Perhaps there are other ideas for incremental backup of postgres databases?
Your input would be much appreciated.

Thanks
Palle


Re: incremental backup of postgres database?

From
Gary Stainburn
Date:
On Friday 07 Feb 2003 4:05 pm, Palle Girgensohn wrote:
> Hi!
>
> What would be the best suggestion for incremental backup of a rather large
> database, where the bulk data volume consists of large objects. Since
> backup will be transmitted over a 2 Mbit/s internet line, we need to
> minimize the data flow for each nightly backup. The compressed database
> dump file, when dumped with pg_dump -F c -b, is roughly 2.5 GB, whereas a
> dump without large objects is roughly is only 2% that size. I can live with
> having to transfer the BLOB-less dump every night, but not several
> gigabytes of data...
>
> So, I will either need to find  a way to get the latest data (I have
> timestamps for all LOBS) and somehow get it to a file in a restorable
> format... One simple way would be to select all new blobs into a temp file
> and copy that table to a backup file
>
> or
>
> replicate the database in real time to the backup site, using one of the
> replication projects? How robust are the replication systems today? What
> will happen if the 2Mb/s line fails temporarily?
>
> Perhaps there are other ideas for incremental backup of postgres databases?
> Your input would be much appreciated.
>
> Thanks
> Palle

Just of the top of my head, have you looked at 'diff'ing the archive, e.g.

diff yesterday.sql today.sql |gzip -c >diff.sql.gz

to see what sort of file you'll be looking at.  I'm looking at using a similar
system here, although I don't have LOBS.

Gary


>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Gary Stainburn

This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000


Re: incremental backup of postgres database?

From
Palle Girgensohn
Date:
--On fredag, februari 07, 2003 16.49.38 +0000 Gary Stainburn
<gary.stainburn@ringways.co.uk> wrote:

> On Friday 07 Feb 2003 4:05 pm, Palle Girgensohn wrote:
>> Hi!
>>
>> What would be the best suggestion for incremental backup of a rather
>> large database, where the bulk data volume consists of large objects.
>> Since backup will be transmitted over a 2 Mbit/s internet line, we need
>> to minimize the data flow for each nightly backup. The compressed
>> database dump file, when dumped with pg_dump -F c -b, is roughly 2.5 GB,
>> whereas a dump without large objects is roughly is only 2% that size. I
>> can live with having to transfer the BLOB-less dump every night, but not
>> several gigabytes of data...
>>
>> So, I will either need to find  a way to get the latest data (I have
>> timestamps for all LOBS) and somehow get it to a file in a restorable
>> format... One simple way would be to select all new blobs into a temp
>> file and copy that table to a backup file
>>
>> or
>>
>> replicate the database in real time to the backup site, using one of the
>> replication projects? How robust are the replication systems today? What
>> will happen if the 2Mb/s line fails temporarily?
>>
>> Perhaps there are other ideas for incremental backup of postgres
>> databases? Your input would be much appreciated.
>>
>> Thanks
>> Palle
>
> Just of the top of my head, have you looked at 'diff'ing the archive, e.g.
>
> diff yesterday.sql today.sql |gzip -c >diff.sql.gz
>
> to see what sort of file you'll be looking at.  I'm looking at using a
> similar  system here, although I don't have LOBS.

The though has accourred to me, but the files are in binary format, so some
sort of binary diff program, like xdiff, would be needed. Also, I think, in
the dumps, the tuples are in an nondeterministic order, and this order
might differ after each vaccum. This will make diffs rather large as well.

But hey, I'll check it, it might be worthwile.

/Palle