Thread: Backing up large databases
The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.
I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.
Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.
If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear.. The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.
Thanks,
Steve
Hi Steve,
Very interested to hear about your setup, as I have a similar setup (backend to a mail server/SPAM scanner) although on a much lighter load at the moment.
My database is only just touching a GB so nothing near the scale of yours! I use a file-system level backup, and am currently testing a PITR continuous recovery onto a hot-standby server.
Tar-ing the database directory currently takes about a minute (at 1GB), so as you can estimate it’d be about 3 hours for yours.
My future plan for when my database grows larger, is with the use of WAL logging – have a base backup taken on a Sunday morning (our quietest time), ship this to the hot-standby once a week, and start it off in a recovery mode (using my rolling-WAL script I’m testing now.) Then throughout the week, send the WAL logs from the live box as they become available down to the standby, which then get processed on arrival – these files are 16MB in size (I believe this can be changed).
The beauty of all this is it doesn’t require the database to be taken off-line, or slowed down.
This is coming from an 8.1 server, I believe it’d be okay for 7.4 but don’t quote me on it.
Regards
Andy
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Burrows
Sent: 28 April 2006 4:58 pm
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backing up large databases
I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.
The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.
I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.
Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.
If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear.. The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.
Thanks,
Steve
!DSPAM:14,4452344633691957362147!
Hi Steve, If you can afford to move forwards to 8.x then you can benefit from PITR (Point In Time Recovery). See http://www.postgresql.org/docs/8.0/interactive/backup-online.html You can do this without making a PITR live replica (as I understand it, you're happy to trust your hardware so you don't need a hot standby machine ready to takeover when you have hardware failure). This mechanism allows the base snapshot(s) to be taken while the database is online. In fact, apart from recovery itself, there is never a need to take the database offline. So you would then need to (a) copy to tape/remote disk/... your periodic base backup and (b) copy to tape/remote disk/... each archived WAL file. Overall you will be writing much more data this way than running pg_dump now and then, but provided you can afford the storage you can recover to any point in time covered by your archived WAL files. Sooner or later you are going to hit the magic 250GB point - at this point your 0.5TB array isn't big enough to store the live data and a base backup. At this point you will need to look at making the base backup happen across the network (or add more disks). I don't think it matters how long it takes to make the base backup, provided you are always copying the WAL files too. Hope this helps, Robin Steve Burrows wrote: > I am struggling to find an acceptable way of backing up a PostgreSQL > 7.4 database. > > The database is quite large, currently it occupies about 180GB, > divided into two elements, a set of active tables and a set of archive > tables which are only used for insertions. > > I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a > single file of 126GB. Obviously as the database continues to grow it > will soon be so large that it cannot be pg_dumped within a day. > Running rsync to do a complete fresh copy of the pgsql file structure > took 4 hours, but later that day running another iteration of rsync > (which should have only copied changed files) took 3 hours, and I > cannot afford to have the db down that long. > > Anybody with any ideas? The database is being used as the backend for > a mail server, so it has transactions 24 hours a day but is quieter at > night. I want to be able to back it up or replicate it on a daily > basis with minimum downtime so that the mail backlog doesn't get too > large. Ideally I want the first generation of backup/replica going > onto the same machine as the original because the volume of data is > such that any attempt at network or tape backup of the live files will > require too much downtime, once I've got a backup then I can copy that > out to other NAS or tape at leisure. > > If anyone has experience of safeguarding a similarly large PostgreSQL > database with minimal downtime I'd be delighted to hear.. The machine > is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL > PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's > not exactly overstretched. > > Thanks, > > Steve
On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote: > > Anybody with any ideas? The database is being used as the backend for > a mail server, so it has transactions 24 hours a day but is quieter at > night. I want to be able to back it up or replicate it on a daily > basis with minimum downtime so that the mail backlog doesn't get too > large. Ideally I want the first generation of backup/replica going > onto the same machine as the original because the volume of data is > such that any attempt at network or tape backup of the live files will > require too much downtime, once I've got a backup then I can copy that > out to other NAS or tape at leisure. > What about LVM2 snapshot functionality?. We use snapshot to get a replica of the data partition for PITR purposes (I have not tested this outside the PITR system). To be 100% sure that the snapshot of the database is 100% consistent you can: 1) stop the database 2) take the snapshot 3) start the database 4) mount the snapshot block device created 5) move the data in the snapshot block device (consistent copy of your database) to another location outside the server without having downtime in your database. 6) umount snapshot block device 7) remove snapshot block device (so you can recycle the space used) A snapshot of a 25GB logical volume (with a 15GB database) takes 0.4sec in our system (RHEL AS4/amd64linux), I don't think it will take much longer with a 180GB LV, we are talking about a few seconds downtime. Just an idea :) More information about LVM: http://www.tldp.org/HOWTO/LVM-HOWTO/ http://www.tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html http://www.tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html with regards -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
You could archive the data as you perform the dump
pg_dump <your options> | bzip2 > filename.bz2
bzip2 offers an excellent compression and you could end up with an archive with less than 10GB
Later you can reload that into another db on a different server
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Burrows
Sent: Friday, April 28, 2006 11:58 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] Backing up large databases
I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.
The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.
I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.
Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.
If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear.. The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.
Thanks,
Steve
Why don't you set up a second server with identical database and use Slony to replicate the primary one there. You can then do whatever you want on the mirror - dump it on a tape, etc. On Friday 28 April 2006 08:57, Steve Burrows wrote: > I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 > database. > > The database is quite large, currently it occupies about 180GB, divided > into two elements, a set of active tables and a set of archive tables > which are only used for insertions. > > I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a > single file of 126GB. Obviously as the database continues to grow it > will soon be so large that it cannot be pg_dumped within a day. Running > rsync to do a complete fresh copy of the pgsql file structure took 4 > hours, but later that day running another iteration of rsync (which > should have only copied changed files) took 3 hours, and I cannot afford > to have the db down that long. > > Anybody with any ideas? The database is being used as the backend for a > mail server, so it has transactions 24 hours a day but is quieter at > night. I want to be able to back it up or replicate it on a daily basis > with minimum downtime so that the mail backlog doesn't get too large. > Ideally I want the first generation of backup/replica going onto the > same machine as the original because the volume of data is such that any > attempt at network or tape backup of the live files will require too > much downtime, once I've got a backup then I can copy that out to other > NAS or tape at leisure. > > If anyone has experience of safeguarding a similarly large PostgreSQL > database with minimal downtime I'd be delighted to hear.. The machine > is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL > PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's > not exactly overstretched. > > Thanks, > > Steve -- UC -- Open Source Solutions 4U, LLC 1618 Kelly St Phone: +1 707 568 3056 Santa Rosa, CA 95401 Cell: +1 650 302 2405 United States Fax: +1 707 568 6416
On Apr 28, 2006, at 10:34 AM, Andy Shellam wrote: > My future plan for when my database grows larger, is with the use > of WAL logging – have a base backup taken on a Sunday morning (our > quietest time), ship this to the hot-standby once a week, and start > it off in a recovery mode (using my rolling-WAL script I’m testing > now.) Then throughout the week, send the WAL logs from the live > box as they become available down to the standby, which then get > processed on arrival – these files are 16MB in size (I believe this > can be changed). Theoretically the size of WAL files can be changed, but in practice no one ever does this. In other words, it's probably not a great idea. > This is coming from an 8.1 server, I believe it’d be okay for 7.4 > but don’t quote me on it. PITR was introduced in 8.0. There's a number of other reasons to upgrade as well. In the meantime, Slony might be a viable alternative. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote: > I am struggling to find an acceptable way of backing up a PostgreSQL > 7.4 database. > > The database is quite large, currently it occupies about 180GB, > divided into two elements, a set of active tables and a set of archive > tables which are only used for insertions. > > I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a > single file of 126GB. Obviously as the database continues to grow it > will soon be so large that it cannot be pg_dumped within a day. > Running rsync to do a complete fresh copy of the pgsql file structure > took 4 hours, but later that day running another iteration of rsync > (which should have only copied changed files) took 3 hours, and I > cannot afford to have the db down that long. > > Anybody with any ideas? You need not backup the whole database in one go. You can copy only changed data out of the tables using your knowledge of the mail store. That way you'll not need to have such a long running backup and it won't be so large either. You can then reassemble the pieces later into a new table in case of recovery. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Fri, 2006-04-28 at 15:57 +0000, Steve Burrows wrote: >> I am struggling to find an acceptable way of backing up a PostgreSQL >> 7.4 database. Depending on your OS and hardware setup, you may have other options, such as Veritas snapshotting. It is expensive, but saves time. >> >> The database is quite large, currently it occupies about 180GB, >> divided into two elements, a set of active tables and a set of archive >> tables which are only used for insertions. >> >> I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a >> single file of 126GB. Obviously as the database continues to grow it >> will soon be so large that it cannot be pg_dumped within a day. >> Running rsync to do a complete fresh copy of the pgsql file structure >> took 4 hours, but later that day running another iteration of rsync >> (which should have only copied changed files) took 3 hours, and I >> cannot afford to have the db down that long. >> >> Anybody with any ideas? > > You need not backup the whole database in one go. > > You can copy only changed data out of the tables using your knowledge of > the mail store. That way you'll not need to have such a long running > backup and it won't be so large either. You can then reassemble the > pieces later into a new table in case of recovery. > -- ---------------------------------------------------------------------------- Naomi Walker Chief Information Officer Eldorado Computing, Inc. nwalker@eldocomp.com 602-604-3100 ---------------------------------------------------------------------------- What lies behind us and what lies before us are tiny matters compared to what lies within us. - William Morrow ---------------------------------------------------------------------------- -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by theindividual or entity to which it is addressed, and may contain information that is privileged, confidential or exemptfrom disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwardedto you without proper authority, you are notified that any use or dissemination of this information in any manneris strictly prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete this mailfrom your records.
Steve
Steve Burrows wrote: I am struggling to find an acceptable way of backing up a PostgreSQL 7.4 database.
The database is quite large, currently it occupies about 180GB, divided into two elements, a set of active tables and a set of archive tables which are only used for insertions.
I ran pg_dump -Fc recently, it took 23.5 hours to run, and output a single file of 126GB. Obviously as the database continues to grow it will soon be so large that it cannot be pg_dumped within a day. Running rsync to do a complete fresh copy of the pgsql file structure took 4 hours, but later that day running another iteration of rsync (which should have only copied changed files) took 3 hours, and I cannot afford to have the db down that long.
Anybody with any ideas? The database is being used as the backend for a mail server, so it has transactions 24 hours a day but is quieter at night. I want to be able to back it up or replicate it on a daily basis with minimum downtime so that the mail backlog doesn't get too large. Ideally I want the first generation of backup/replica going onto the same machine as the original because the volume of data is such that any attempt at network or tape backup of the live files will require too much downtime, once I've got a backup then I can copy that out to other NAS or tape at leisure.
If anyone has experience of safeguarding a similarly large PostgreSQL database with minimal downtime I'd be delighted to hear.. The machine is running 2 Xeons, 4GB ram and a half-terabyte RAID10 array on a DELL PERC scsi subsystem, with a load average of around 0.5 - 0.6, so it's not exactly overstretched.
Thanks,
Steve