Thread: Backup Policy & Disk Space Issues
Hi, In the company, we're facing with serious disk space problems which is not caused by PostgreSQL, but the nature of our data. Database sizes are around 200-300GB, which is relatively not that much, but databases require strict backup policies: - Incremental backup for each day. (250GB) - Full backup for each week of the last month. (4 x 250GB) - Full backup for each month of the last year. (12 x 250GB) As a result, we require a space of size (roughly) 250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB for each server per year. Considering we have ~15 servers, 15x4250 = 63750 = 62.25TB as can be seen, growth of the backed up data sizes have almost no relations with the actual data sizes. At the moment, we're using tape drive cartridges for weekly and monthly backups. But the incremental backups, plus the database itself requires a constant space of size ~500GB. To summarize, as a DBA most of my time is wasting with validating if the backup policies performed right, cartridges captioned correctly, etc. What are your experiences with similar sizes of data? How do you cope with backups? Do you recommend any other hardware/software solutions? Regards.
On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote: > Hi, > > In the company, we're facing with serious disk space problems which > is not caused by PostgreSQL, but the nature of our data. Database > sizes are around 200-300GB, which is relatively not that much, but > databases require strict backup policies: > > - Incremental backup for each day. (250GB) What exactly does this mean in the context of PostgreSQL? We don't, as far as I've been able to determine, support this in either the community branch or even in any proprietary one. > - Full backup for each week of the last month. (4 x 250GB) > - Full backup for each month of the last year. (12 x 250GB) > > As a result, we require a space of size (roughly) > > 250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB > > for each server per year. Considering we have ~15 servers, > > 15x4250 = 63750 = 62.25TB SATA disk space is quite cheap these days, so unless something is very badly wrong with your funding model, this is not really a problem. Here's one outfit that will build and configure storage hardware for you: http://www.capricorn-tech.com/ Cheers, David. > as can be seen, growth of the backed up data sizes have almost no > relations with the actual data sizes. At the moment, we're using tape > drive cartridges for weekly and monthly backups. But the incremental > backups, plus the database itself requires a constant space of size > ~500GB. > > To summarize, as a DBA most of my time is wasting with validating if the > backup policies performed right, cartridges captioned correctly, etc. > What are your experiences with similar sizes of data? How do you cope > with backups? Do you recommend any other hardware/software solutions? > > > Regards. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, 22 Dec 2008, David Fetter <david@fetter.org> writes: > On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote: >> Hi, >> >> In the company, we're facing with serious disk space problems which >> is not caused by PostgreSQL, but the nature of our data. Database >> sizes are around 200-300GB, which is relatively not that much, but >> databases require strict backup policies: >> >> - Incremental backup for each day. (250GB) > > What exactly does this mean in the context of PostgreSQL? We don't, > as far as I've been able to determine, support this in either the > community branch or even in any proprietary one. I tried to mean WAL shipping in here. (You know, "business terminology" for n00b boss staff.) >> - Full backup for each week of the last month. (4 x 250GB) >> - Full backup for each month of the last year. (12 x 250GB) >> >> As a result, we require a space of size (roughly) >> >> 250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB >> >> for each server per year. Considering we have ~15 servers, >> >> 15x4250 = 63750 = 62.25TB > > SATA disk space is quite cheap these days, so unless something is very > badly wrong with your funding model, this is not really a problem. Umm... A minority of the servers have SATA interface. (Most of 'em use SAS drives and SAN systems.) > Here's one outfit that will build and configure storage hardware for > you: > > http://www.capricorn-tech.com/ Interesting I'll check it out. Regards.
Volkan YAZICI wrote: > On Mon, 22 Dec 2008, David Fetter <david@fetter.org> writes: >> On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote: >>> 15x4250 = 63750 = 62.25TB >> SATA disk space is quite cheap these days, so unless something is very >> badly wrong with your funding model, this is not really a problem. > > Umm... A minority of the servers have SATA interface. (Most of 'em use > SAS drives and SAN systems.) Yes... but you can buy new SATA storage enclosures or storage servers. SATA storage enclosures with SAS and/or Fibre Channel interfaces to the host exist, and are suitable for exactly this sort of bulk low-performance archival storage role. You have enough data that you should be expecting to spend a bit on backups I'm afraid. SATA storage arrays might not be ideal for your long-term storage full backups, but they're perfect for the storage of WAL archives & snapshots, and for the shorter-lived backups that you'll periodically rotate out. I built an 8TB storage server for the (small) company I work for at a pitiful cost to ensure that we always had at least two versions of all backups in storage that was reliable*, immediately accessible, and encrypted in case of theft. It's not the only backup mechanism, but it's the main one and by its self is adequate for all but the most critical data. It's hard to overemphasise the benefits its had in terms of improved backup reliability and quick access to backups. About 100tb, which is about what I'd plan for in your case, is ... more expensive. That said, with redundancy within each enclosure and between them it'd be a pretty solid way to store your backups. It helps that you may not want to store your long-term archival backups on SATA arrays, and it's also not clear to what extent you've investigated options for reducing your backup sizes in the first place. 40-50 TB is not an unreasonable amount of storage to pick up in the form of arrays of large external SATA enclosures. In particular, if you're backing up the database cluster at the file system level, you might want to look into using dumps for your longer-lived backups instead. For one thing, a compressed dump tends to be a LOT smaller then a filesystem-level cluster backup of a Pg cluster, and for another you protect yourself against most forms of undiscovered corruption in the cluster. If you do go for SATA storage, avoid systems that rely on SATA multiplexers if possible. They're REALLY slow, and are particularly awful in RAID environments. Given that alternatives that have many SATA interfaces and a single SAS port for the host interface exist, as do internally RAID-ed Fibre Channel options, multiplexer based systems don't seem worth it. * with RAID and proper array scrubbing on a server attached to a UPS it's WAY more reliable than the previous DDS-4 DAT backups. It also has the advantage of not needing five or six tapes per day and operating completely unattended, so risk of human error is drastically reduced. -- Craig Ringer