Thread: Incremental backup with RSYNC or something?
Hi. I currently have a cronjob to do a full pgdump of the database every day. And then gzip it for saving to my backup drive. However, my db is now 60GB in size, so this daily operation is making less and less sense. (Some of you may think this is foolish to begin with). Question: what can I do to rsync only the new additions in every table starting 00:00:01 until 23:59:59 for each day? Searching google leads to complex things like "incremental WAL" and whatnot, or talks of stuff like pgcluster. I'm hoping there's a more straightforward core solution without additional software or PHD degrees. Many thanks for any ideas! PK
Hi, Well, the 'complex' stuff is only as there for larger or high-traffic DBs. Besides at 60GB that is a largish DB in itself and you should begin to try out a few other backup methods nonetheless. That is moreso, if you are taking entire DB backups everyday, you would save a considerable lot on (backup) storage. Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB) and it takes only half a day to do a pgdump+gzip (both). One thing that comes to mind, how are you compressing? I hope you are doing this in one operation (or at least piping pgdump to gzip before writing to disk)? -- Robins Tharakan On 11/13/2011 05:08 PM, Phoenix Kiula wrote: > Hi. > > I currently have a cronjob to do a full pgdump of the database every > day. And then gzip it for saving to my backup drive. > > However, my db is now 60GB in size, so this daily operation is making > less and less sense. (Some of you may think this is foolish to begin > with). > > Question: what can I do to rsync only the new additions in every table > starting 00:00:01 until 23:59:59 for each day? > > Searching google leads to complex things like "incremental WAL" and > whatnot, or talks of stuff like pgcluster. I'm hoping there's a more > straightforward core solution without additional software or PHD > degrees. > > Many thanks for any ideas! > PK >
On Sun, Nov 13, 2011 at 8:42 PM, Robins Tharakan <robins.tharakan@comodo.com> wrote: > Hi, > > Well, the 'complex' stuff is only as there for larger or high-traffic DBs. > Besides at 60GB that is a largish DB in itself and you should begin to try > out a few other backup methods nonetheless. That is moreso, if you are > taking entire DB backups everyday, you would save a considerable lot on > (backup) storage. Thanks. I usually keep only the last 6 days of it. And monthly backups as of Day 1. So it's not piling up or anything. What "other methods" do you recommend? That was in fact my question. Do I need to install some modules? > Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB) and > it takes only half a day to do a pgdump+gzip (both). One thing that comes to > mind, how are you compressing? I hope you are doing this in one operation > (or at least piping pgdump to gzip before writing to disk)? I'm gzipping with this command (this is my backup.sh)-- BKPFILE=/backup/pg/dbback-${DATA}.sql pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE} gzip --fast ${BKPFILE} Is this good enough? Sadly, this takes up over 97% of the CPU when it's running!
> What "other methods" do you recommend? That was in fact my question. > Do I need to install some modules? Well depending on your PG version you could read up about the various backup methods. I believe you'll be interested in 24.3 there when you ask for WAL archiving. The good thing is, its useful for DBs much bigger and especially for those that 'cant' go down for even a minute, but yes it has its trade-offs. (Its not that bad actually, but its a call you need to take). http://www.postgresql.org/docs/8.4/static/backup.html > I'm gzipping with this command (this is my backup.sh)-- > > > BKPFILE=/backup/pg/dbback-${DATA}.sql > pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE} > gzip --fast ${BKPFILE} You could club the pgdump / gzip in one step, thereby avoiding extra writes to disk. The URL below should help you on that (pgdump dbname | gzip > file.gz) http://www.postgresql.org/docs/8.4/static/backup-dump.html#BACKUP-DUMP-LARGE You could also do a pg_dump -Fc | gzip -1 -c > dumpfile.gz at the cost of a slightly larger (but faster backup). -- Robins Tharakan
> You could also do a > > pg_dump -Fc | gzip -1 -c > dumpfile.gz > > at the cost of a slightly larger (but faster backup). Actually if you're going this route, you could skip even the pg_dump compression as well... pg_dump db | gzip -1 -c > dumpfile.gz -- Robins Tharakan
pg_dump -Fc already compresses, no need to pipe through gzip
On Nov 13, 2011 7:39 PM, "Phoenix Kiula"
>
> Question: what can I do to rsync only the new additions in every table
> starting 00:00:01 until 23:59:59 for each day?
You can't really. You can rsync the whole thing and it can be faster, but you can't really just copy the last changes as a diff.
That's because Pg writes all over the data files, it doesn't just append. There isn't any 'last changed timestamp' on records, and even if there were Pg would have no way to know which records to delete in the replication target.
If you want differential backups you'll need to use a row based replication system like slony or bucardo.
It'd be nice if Pg offered easier differential backups, but at this point there isn't really anything.
> Searching google leads to complex things like "incremental WAL" and
> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
> straightforward core solution without additional software or PHD
> degrees.
Nothing really basic. You'll need to use PITR (WAL shipping), streaming replication or a row level replication solution.
>
> Many thanks for any ideas!
> PK
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote: > pg_dump -Fc already compresses, no need to pipe through gzip > I dont think that'll use two core's if you have 'em. The pipe method will use two cores, so it should be faster. (assumingyou are not IO bound). -Andy
On Sun, Nov 13, 2011 at 10:45 PM, Andy Colson <andy@squeakycode.net> wrote: > On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote: >> >> pg_dump -Fc already compresses, no need to pipe through gzip >> > > I dont think that'll use two core's if you have 'em. The pipe method will > use two cores, so it should be faster. (assuming you are not IO bound). I am likely IO bound. Anyway, what's the right code for the pipe method? I think the earlier recommendation had a problem as "-Fc" already does compression. Is this the right code for the FASTEST possible backup if I don't care about the size of the dump, all I want is that it's not CPU-intensive (with the tables I wish excluded) -- BKPFILE=/backup/pg/dbback-${DATA}.sql pg_dump MYDB -T excludetable1 -T excludetable2 -U MYDB_MYDB | gzip --fast > ${BKPFILE} Thanks!
Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?
On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in> wrote: >> Question: what can I do to rsync only the new additions in every table >> starting 00:00:01 until 23:59:59 for each day? > > A table level replication (like Slony) should help here. Slony needs more than one physical server, right?
On Sun, Nov 13, 2011 at 7:01 AM, Craig Ringer <ringerc@ringerc.id.au> wrote: > > On Nov 13, 2011 7:39 PM, "Phoenix Kiula" > >> Searching google leads to complex things like "incremental WAL" and >> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more >> straightforward core solution without additional software or PHD >> degrees. > > Nothing really basic. You'll need to use PITR (WAL shipping), streaming > replication or a row level replication solution. One method they could use would be to partition the data into old and new, and backup the new, then merge the partitions and start a new one or something like that. But that's a huge pain. I would suggest PITR backups here as the likely best match.
On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:Slony needs more than one physical server, right?
>> Question: what can I do to rsync only the new additions in every table
>> starting 00:00:01 until 23:59:59 for each day?
>
> A table level replication (like Slony) should help here.
On Mon, Nov 14, 2011 at 12:45 AM, Venkat Balaji <venkat.balaji@verse.in> wrote: >> Question: what can I do to rsync only the new additions in every table >> starting 00:00:01 until 23:59:59 for each day? > > A table level replication (like Slony) should help here. > Or > A trigger based approach with dblink would be an-other (but, a bit complex) > option. If you don't actually care about the rows of data specifically, and just want incremental data diff, you might look at what options your filesystem gives you. We often use incremental snapshots on ZFS to give use smaller copies that can be shipped off to the backup server and used to reconstruct the server if needed. Robert Treat conjecture: xzilla.net consulting: omniti.com
On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > Hi. > > I currently have a cronjob to do a full pgdump of the database every > day. And then gzip it for saving to my backup drive. > > However, my db is now 60GB in size, so this daily operation is making > less and less sense. (Some of you may think this is foolish to begin > with). > > Question: what can I do to rsync only the new additions in every table > starting 00:00:01 until 23:59:59 for each day? > > Searching google leads to complex things like "incremental WAL" and > whatnot, or talks of stuff like pgcluster. I'm hoping there's a more > straightforward core solution without additional software or PHD > degrees. My standard advice to this problem is to do a HS/SR setup which solves a number of problems simultaneously. It still makes sence to take a full snapshot once in a while though -- but you can take it from the standby. merlin
On 11/22/2011 3:28 PM, Merlin Moncure wrote: > On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote: >> Hi. >> >> I currently have a cronjob to do a full pgdump of the database every >> day. And then gzip it for saving to my backup drive. >> >> However, my db is now 60GB in size, so this daily operation is making >> less and less sense. (Some of you may think this is foolish to begin >> with). >> >> Question: what can I do to rsync only the new additions in every table >> starting 00:00:01 until 23:59:59 for each day? >> >> Searching google leads to complex things like "incremental WAL" and >> whatnot, or talks of stuff like pgcluster. I'm hoping there's a more >> straightforward core solution without additional software or PHD >> degrees. > My standard advice to this problem is to do a HS/SR setup which solves > a number of problems simultaneously. It still makes sence to take a > full snapshot once in a while though -- but you can take it from the > standby. > > merlin > How long is this backup taking? I have a ~100GB database that I back up with pg_dump (which compresses as it dumps if you want it to) and that only takes 35 minutes. Granted, I have it on some fast SCSI drives in RAID 1, but even a single SATA drive should still finish in a decent amount of time.
On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote: > On 11/22/2011 3:28 PM, Merlin Moncure wrote: ...... > How long is this backup taking? I have a ~100GB database that I back up > with pg_dump (which compresses as it dumps if you want it to) and that only > takes 35 minutes. Granted, I have it on some fast SCSI drives in RAID 1, > but even a single SATA drive should still finish in a decent amount of time. Hi Alex, could you share what exact command you use? Mine are SCSI too, in RAID 10, but the dump takes over 2-3 hours (60 GB database) and the CPU consumption during this time is huge. Thanks!
On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote: >> On 11/22/2011 3:28 PM, Merlin Moncure wrote: > ...... >> How long is this backup taking? I have a ~100GB database that I back up >> with pg_dump (which compresses as it dumps if you want it to) and that only >> takes 35 minutes. Granted, I have it on some fast SCSI drives in RAID 1, >> but even a single SATA drive should still finish in a decent amount of time. > > > Hi Alex, could you share what exact command you use? Mine are SCSI > too, in RAID 10, but the dump takes over 2-3 hours (60 GB database) > and the CPU consumption during this time is huge. I wrote a bunch of shell scripts tools to backup postgres 9.1 with rsync/ccollect (another hardlink tool), I might find the time to publish it on github once I find the time. -- Benjamin Henrion <bhenrion at ffii.org> FFII Brussels - +32-484-566109 - +32-2-4148403 "In July 2005, after several failed attempts to legalise software patents in Europe, the patent establishment changed its strategy. Instead of explicitly seeking to sanction the patentability of software, they are now seeking to create a central European patent court, which would establish and enforce patentability rules in their favor, without any possibility of correction by competing courts or democratically elected legislators."
On Thu, Nov 24, 2011 at 11:53 PM, Benjamin Henrion <bh@udev.org> wrote: > On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: >> On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote: >>> On 11/22/2011 3:28 PM, Merlin Moncure wrote: >> ...... >>> How long is this backup taking? I have a ~100GB database that I back up >>> with pg_dump (which compresses as it dumps if you want it to) and that only >>> takes 35 minutes. Granted, I have it on some fast SCSI drives in RAID 1, >>> but even a single SATA drive should still finish in a decent amount of time. >> >> >> Hi Alex, could you share what exact command you use? Mine are SCSI >> too, in RAID 10, but the dump takes over 2-3 hours (60 GB database) >> and the CPU consumption during this time is huge. > > I wrote a bunch of shell scripts tools to backup postgres 9.1 with > rsync/ccollect (another hardlink tool), I might find the time to > publish it on github once I find the time. Thanks Ben. Look forward to it. Will the script be different for version 9.0.5? Would love to have rsync working. Even without a script, just the commands will help.