Thread: Incremental backup with RSYNC or something?

Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

13 November 2011, 07:38:12

Hi.

I currently have a cronjob to do a full pgdump of the database every
day. And then gzip it for saving to my backup drive.

However, my db is now 60GB in size, so this daily operation is making
less and less sense. (Some of you may think this is foolish to begin
with).

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

Many thanks for any ideas!
PK

Re: Incremental backup with RSYNC or something?

From

Robins Tharakan

Date:

13 November 2011, 08:42:32

Hi,

Well, the 'complex' stuff is only as there for larger or high-traffic
DBs. Besides at 60GB that is a largish DB in itself and you should begin
to try out a few other backup methods nonetheless. That is moreso, if
you are taking entire DB backups everyday, you would save a considerable
lot on (backup) storage.

Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB)
and it takes only half a day to do a pgdump+gzip (both). One thing that
comes to mind, how are you compressing? I hope you are doing this in one
operation (or at least piping pgdump to gzip before writing to disk)?

--
Robins Tharakan

On 11/13/2011 05:08 PM, Phoenix Kiula wrote:
> Hi.
>
> I currently have a cronjob to do a full pgdump of the database every
> day. And then gzip it for saving to my backup drive.
>
> However, my db is now 60GB in size, so this daily operation is making
> less and less sense. (Some of you may think this is foolish to begin
> with).
>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?
>
> Searching google leads to complex things like "incremental WAL" and
> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
> straightforward core solution without additional software or PHD
> degrees.
>
> Many thanks for any ideas!
> PK
>

Re: Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

13 November 2011, 09:23:18

On Sun, Nov 13, 2011 at 8:42 PM, Robins Tharakan
<robins.tharakan@comodo.com> wrote:
> Hi,
>
> Well, the 'complex' stuff is only as there for larger or high-traffic DBs.
> Besides at 60GB that is a largish DB in itself and you should begin to try
> out a few other backup methods nonetheless. That is moreso, if you are
> taking entire DB backups everyday, you would save a considerable lot on
> (backup) storage.


Thanks. I usually keep only the last 6 days of it. And monthly backups
as of Day 1. So it's not piling up or anything.

What "other methods" do you recommend? That was in fact my question.
Do I need to install some modules?



> Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB) and
> it takes only half a day to do a pgdump+gzip (both). One thing that comes to
> mind, how are you compressing? I hope you are doing this in one operation
> (or at least piping pgdump to gzip before writing to disk)?



I'm gzipping with this command (this is my backup.sh)--


    BKPFILE=/backup/pg/dbback-${DATA}.sql
    pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE}
    gzip --fast ${BKPFILE}


Is this good enough? Sadly, this takes up over 97% of the CPU when it's running!

Re: Incremental backup with RSYNC or something?

From

Robins Tharakan

Date:

13 November 2011, 09:36:51

> What "other methods" do you recommend? That was in fact my question.
> Do I need to install some modules?

Well depending on your PG version you could read up about the various
backup methods. I believe you'll be interested in 24.3 there when you
ask for WAL archiving. The good thing is, its useful for DBs much bigger
and especially for those that 'cant' go down for even a minute, but yes
it has its trade-offs. (Its not that bad actually, but its a call you
need to take).

http://www.postgresql.org/docs/8.4/static/backup.html

 > I'm gzipping with this command (this is my backup.sh)--
 >
 >
 >      BKPFILE=/backup/pg/dbback-${DATA}.sql
 >      pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE}
 >      gzip --fast ${BKPFILE}

You could club the pgdump / gzip in one step, thereby avoiding extra
writes to disk. The URL below should help you on that (pgdump dbname |
gzip > file.gz)

http://www.postgresql.org/docs/8.4/static/backup-dump.html#BACKUP-DUMP-LARGE

You could also do a

pg_dump -Fc | gzip -1 -c > dumpfile.gz

at the cost of a slightly larger (but faster backup).

--
Robins Tharakan

Re: Incremental backup with RSYNC or something?

From

Robins Tharakan

Date:

13 November 2011, 09:40:45

> You could also do a
>
> pg_dump -Fc | gzip -1 -c > dumpfile.gz
>
> at the cost of a slightly larger (but faster backup).

Actually if you're going this route, you could skip even the pg_dump
compression as well...

pg_dump db | gzip -1 -c > dumpfile.gz

--
Robins Tharakan

Re: Incremental backup with RSYNC or something?

From

Gregg Jaskiewicz

Date:

13 November 2011, 09:51:57

pg_dump -Fc already compresses, no need to pipe through gzip

Re: Incremental backup with RSYNC or something?

From

Craig Ringer

Date:

13 November 2011, 10:02:09

On Nov 13, 2011 7:39 PM, "Phoenix Kiula"
>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?

You can't really. You can rsync the whole thing and it can be faster, but you can't really just copy the last changes as a diff.

That's because Pg writes all over the data files, it doesn't just append. There isn't any 'last changed timestamp' on records, and even if there were Pg would have no way to know which records to delete in the replication target.

If you want differential backups you'll need to use a row based replication system like slony or bucardo.

It'd be nice if Pg offered easier differential backups, but at this point there isn't really anything.

> Searching google leads to complex things like "incremental WAL" and
> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
> straightforward core solution without additional software or PHD
> degrees.

Nothing really basic. You'll need to use PITR (WAL shipping), streaming replication or a row level replication solution.
>
> Many thanks for any ideas!
> PK
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Incremental backup with RSYNC or something?

From

Andy Colson

Date:

13 November 2011, 10:46:00

On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote:
> pg_dump -Fc already compresses, no need to pipe through gzip
>

I dont think that'll use two core's if you have 'em.  The pipe method will use two cores, so it should be faster.
(assumingyou are not IO bound). 

-Andy

Re: Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

13 November 2011, 11:10:27

On Sun, Nov 13, 2011 at 10:45 PM, Andy Colson <andy@squeakycode.net> wrote:
> On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote:
>>
>> pg_dump -Fc already compresses, no need to pipe through gzip
>>
>
> I dont think that'll use two core's if you have 'em.  The pipe method will
> use two cores, so it should be faster.  (assuming you are not IO bound).



I am likely IO bound. Anyway, what's the right code for the pipe
method? I think the earlier recommendation had a problem as "-Fc"
already does compression.

Is this the right code for the FASTEST possible backup if I don't care
about the size of the dump, all I want is that it's not CPU-intensive
(with the tables I wish excluded) --


   BKPFILE=/backup/pg/dbback-${DATA}.sql
   pg_dump MYDB -T excludetable1 -T excludetable2 -U MYDB_MYDB | gzip
--fast > ${BKPFILE}


Thanks!

Re: Incremental backup with RSYNC or something?

From

Venkat Balaji

Date:

14 November 2011, 01:45:19

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

A table level replication (like Slony) should help here.

A trigger based approach with dblink would be an-other (but, a bit complex) option.

Thanks

Re: Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

17 November 2011, 20:39:25

On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
>> Question: what can I do to rsync only the new additions in every table
>> starting 00:00:01 until 23:59:59 for each day?
>
> A table level replication (like Slony) should help here.


Slony needs more than one physical server, right?

Re: Incremental backup with RSYNC or something?

From

Scott Marlowe

Date:

18 November 2011, 02:47:38

On Sun, Nov 13, 2011 at 7:01 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>
> On Nov 13, 2011 7:39 PM, "Phoenix Kiula"
>
>> Searching google leads to complex things like "incremental WAL" and
>> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
>> straightforward core solution without additional software or PHD
>> degrees.
>
> Nothing really basic. You'll need to use PITR (WAL shipping), streaming
> replication or a row level replication solution.

One method they could use would be to partition the data into old and
new, and backup the new, then merge the partitions and start a new one
or something like that.  But that's a huge pain.  I would suggest PITR
backups here as the likely best match.

Re: Incremental backup with RSYNC or something?

From

Venkat Balaji

Date:

18 November 2011, 03:09:55

On Fri, Nov 18, 2011 at 6:08 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
>> Question: what can I do to rsync only the new additions in every table
>> starting 00:00:01 until 23:59:59 for each day?
>
> A table level replication (like Slony) should help here.

Slony needs more than one physical server, right?

Not necessarily, you can replicate with-in the same host as well. It all depends on which tables you want to replicate.

It does not depend on number of hosts.

Thanks

Re: Incremental backup with RSYNC or something?

From

Robert Treat

Date:

20 November 2011, 21:37:48

On Mon, Nov 14, 2011 at 12:45 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:
>> Question: what can I do to rsync only the new additions in every table
>> starting 00:00:01 until 23:59:59 for each day?
>
> A table level replication (like Slony) should help here.
> Or
> A trigger based approach with dblink would be an-other (but, a bit complex)
> option.

If you don't actually care about the rows of data specifically, and
just want incremental data diff, you might look at what options your
filesystem gives you. We often use incremental snapshots on ZFS to
give use smaller copies that can be shipped off to the backup server
and used to reconstruct the server if needed.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

Re: Incremental backup with RSYNC or something?

From

Merlin Moncure

Date:

22 November 2011, 17:28:19

On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Hi.
>
> I currently have a cronjob to do a full pgdump of the database every
> day. And then gzip it for saving to my backup drive.
>
> However, my db is now 60GB in size, so this daily operation is making
> less and less sense. (Some of you may think this is foolish to begin
> with).
>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?
>
> Searching google leads to complex things like "incremental WAL" and
> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
> straightforward core solution without additional software or PHD
> degrees.

My standard advice to this problem is to do a HS/SR setup which solves
a number of problems simultaneously.  It still makes sence to take a
full snapshot once in a while though -- but you can take it from the
standby.

merlin

Re: Incremental backup with RSYNC or something?

From

Alex Thurlow

Date:

22 November 2011, 18:14:00

On 11/22/2011 3:28 PM, Merlin Moncure wrote:
> On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula<phoenix.kiula@gmail.com>  wrote:
>> Hi.
>>
>> I currently have a cronjob to do a full pgdump of the database every
>> day. And then gzip it for saving to my backup drive.
>>
>> However, my db is now 60GB in size, so this daily operation is making
>> less and less sense. (Some of you may think this is foolish to begin
>> with).
>>
>> Question: what can I do to rsync only the new additions in every table
>> starting 00:00:01 until 23:59:59 for each day?
>>
>> Searching google leads to complex things like "incremental WAL" and
>> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
>> straightforward core solution without additional software or PHD
>> degrees.
> My standard advice to this problem is to do a HS/SR setup which solves
> a number of problems simultaneously.  It still makes sence to take a
> full snapshot once in a while though -- but you can take it from the
> standby.
>
> merlin
>

How long is this backup taking?  I have a ~100GB database that I back up
with pg_dump (which compresses as it dumps if you want it to) and that
only takes 35 minutes.  Granted, I have it on some fast SCSI drives in
RAID 1, but even a single SATA drive should still finish in a decent
amount of time.

Re: Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

24 November 2011, 11:49:52

On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:
> On 11/22/2011 3:28 PM, Merlin Moncure wrote:
......
> How long is this backup taking?  I have a ~100GB database that I back up
> with pg_dump (which compresses as it dumps if you want it to) and that only
> takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
> but even a single SATA drive should still finish in a decent amount of time.


Hi Alex, could you share what exact command you use? Mine are SCSI
too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
and the CPU consumption during this time is huge.

Thanks!

Re: Incremental backup with RSYNC or something?

From

Benjamin Henrion

Date:

24 November 2011, 11:53:58

On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:
>> On 11/22/2011 3:28 PM, Merlin Moncure wrote:
> ......
>> How long is this backup taking?  I have a ~100GB database that I back up
>> with pg_dump (which compresses as it dumps if you want it to) and that only
>> takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
>> but even a single SATA drive should still finish in a decent amount of time.
>
>
> Hi Alex, could you share what exact command you use? Mine are SCSI
> too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
> and the CPU consumption during this time is huge.

I wrote a bunch of shell scripts tools to backup postgres 9.1 with
rsync/ccollect (another hardlink tool), I might find the time to
publish it on github once I find the time.

--
Benjamin Henrion <bhenrion at ffii.org>
FFII Brussels - +32-484-566109 - +32-2-4148403
"In July 2005, after several failed attempts to legalise software
patents in Europe, the patent establishment changed its strategy.
Instead of explicitly seeking to sanction the patentability of
software, they are now seeking to create a central European patent
court, which would establish and enforce patentability rules in their
favor, without any possibility of correction by competing courts or
democratically elected legislators."

Re: Incremental backup with RSYNC or something?

From

Phoenix Kiula

Date:

24 November 2011, 12:21:00

On Thu, Nov 24, 2011 at 11:53 PM, Benjamin Henrion <bh@udev.org> wrote:
> On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:
>>> On 11/22/2011 3:28 PM, Merlin Moncure wrote:
>> ......
>>> How long is this backup taking?  I have a ~100GB database that I back up
>>> with pg_dump (which compresses as it dumps if you want it to) and that only
>>> takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
>>> but even a single SATA drive should still finish in a decent amount of time.
>>
>>
>> Hi Alex, could you share what exact command you use? Mine are SCSI
>> too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
>> and the CPU consumption during this time is huge.
>
> I wrote a bunch of shell scripts tools to backup postgres 9.1 with
> rsync/ccollect (another hardlink tool), I might find the time to
> publish it on github once I find the time.



Thanks Ben. Look forward to it.

Will the script be different for version 9.0.5? Would love to have
rsync working. Even without a script, just the commands will help.

Re: Incremental backup with RSYNC or something?

From

Kallon Weingarten

Date:

08 August 2013, 07:09:31

Hi Ben,

Are you able to post these scripts?