Thread: Is It Too Big ? - Customer Data Warehouse Growth Projection
I have a customer using Postgresql for a data warehouse. Their current setup is : 5G database with a 50M data load each week. They are running Pg 7.1.3 on Redhat 7.2 the platform is a dual P1000+768M Ram+5x18G 15000 rpm scsi with raid 5 Their growth projections are : 1 month 30G database with a 300M weekly load 3 months 60G database with a 600M weekly load 4 months 300G database with a 3000M weekly load I would appreciate any suggestions on how to plan for this growth. - Clearly hardware will need to be looked at (can we cope continuing to use Intel based platform(s)? ) - Also software, are we okay using Postgresql for a 300G database (is 7.2 aimed at this sized undertaking?), How about the os ( can we keep using Linux or Freebsd ?). I am informed that the expected number of users is low, so that major challenges are big queries and the weekly loads. Most of the space used by the database will be in 1 very big and 1 big table. Thanks in advance Mark
Mark kirkwood <markir@slingshot.co.nz> writes: > I would appreciate any suggestions on how to plan for this growth. > - Clearly hardware will need to be looked at (can we cope continuing to > use Intel based platform(s)? ) > - Also software, are we okay using Postgresql for a 300G database (is > 7.2 aimed at this sized undertaking?), How about the os ( can we keep > using Linux or Freebsd ?). 7.2 is aimed more at 24x7 operation; as far as size of database goes, I wouldn't think it would be much better than 7.1. Probably the major issue for you is that PG doesn't have any provision for spreading a database across multiple filesystems --- so you will have to put the whole DB on one honkin' big RAID array, or use an LVM layer to spread the filesystem across multiple drives in software. I seem to recall having heard that Linux has some ~100GB restriction on the size of a single filesystem, which could be a problem. That might be obsolete information, but be sure to check max filesystem size for whichever kernel you select. > I am informed that the expected number of users is low, so that major > challenges are big queries and the weekly loads. Most of the space used > by the database will be in 1 very big and 1 big table. PG's hard limit on the size of a single table is 2 billion pages, which is 16TB with the default page size and 64TB if you compile with BLCKSZ set to 32K. The nonstandard choice might be a good option for this DB. The prospect of having to dump this database for backup seems a tad daunting. What are you using for a backup device? regards, tom lane
Tom Lane wrote: > > I seem to recall having heard that Linux has some ~100GB restriction on > the size of a single filesystem, which could be a problem. That might > be obsolete information, but be sure to check max filesystem size for > whichever kernel you select. > I don't think Linux does, though some filesystems might: ->uname -a Linux tapycer 2.4.5 #2 SMP Mon Jun 18 16:09:41 MST 2001 i686 unknown ->df Filesystem 1k-blocks Used Available Use% Mounted on ... /dev/pool/pool0 366529628 1318912 365210716 0% /mnt/san1 /dev/pool/pool1 96442588 7028276 89414312 7% /mnt/san2 /dev/pool/pool2 145972564 21280320 124692244 15% /mnt/san3 These are all GFS filesystems (www.sistina.com) located on a SAN. -- Steve Wampler -- swampler@noao.edu O sibile, si ergo. Fortibus es enaro. Nobile, demis trux. Demis phulla causan dux.
On Thu, 2002-01-17 at 09:33, Tom Lane wrote: > I seem to recall having heard that Linux has some ~100GB restriction on > the size of a single filesystem, which could be a problem. That might > be obsolete information, but be sure to check max filesystem size for > whichever kernel you select. No, ext2 (Linux's default filesystem) can use partitions up to 4TB in size. For a system this large, something like XFS might be a good choice (it's been ported to Linux by SGI) -- IIRC, it supports filesystems larger than a petabyte. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Fri, 2002-01-18 at 03:33, Tom Lane wrote: > The prospect of having to dump this database for backup seems a tad > daunting. What are you using for a backup device? > The backup scenario is currently: pg_dump to local disk, copy the archive to another machine put on a dlt (alt-? have to check) tape. I dont think the dumping to disk + copying is going to be sustainable, but the customer likes the idea of the backup being immediately available. regards Mark
On Fri, 2002-01-18 at 03:57, Jim Buttafuoco wrote: > Mark/All, > > I currently have 4 databases at ~800G running RedHat Linux 6.2 and Postgres > 7.0.X (Yes this is an old version but the customer is happy). The application > is a data warehouse. We moved from a 3 node (16 Alphla CPU's, 12G memory) > Oracle parallel server to the 4 Intel (dual 733Mhz each) systems. Our > customer is very happy with both the cost of the Linux/Postgres system and the > performance. Each system has an exact copy of the database (using a custom > rserv to keep the systems in sync). > > Jim > > Jim, Thanks for the information - very encouraging, I hoped we could keep using Intel + Postgresql. I have a couple more questions :-) What do you use to back these db's up ? What sort of io system and raid level are you using ? Thanks in advance Mark
Hello, Use pg_dump over the wire. Set up a machine that has a 100MB card in it and pull the dump straight to another machine for processing. J On 19 Jan 2002, Mark kirkwood wrote: > On Fri, 2002-01-18 at 03:33, Tom Lane wrote: > > > The prospect of having to dump this database for backup seems a tad > > daunting. What are you using for a backup device? > > > The backup scenario is currently: > > pg_dump to local disk, copy the archive to another machine put on a dlt > (alt-? have to check) tape. > > I dont think the dumping to disk + copying is going to be sustainable, > but the customer likes the idea of the backup being immediately > available. > > regards > > Mark > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- -- by way of pgsql-general@commandprompt.com http://www.postgresql.info/ http://www.commandprompt.com/
Le Samedi 19 Janvier 2002 03:44, Mark kirkwood a écrit : > On Fri, 2002-01-18 at 03:33, Tom Lane wrote: > > The prospect of having to dump this database for backup seems a tad > > daunting. What are you using for a backup device? > > The backup scenario is currently: > > pg_dump to local disk, copy the archive to another machine put on a dlt > (alt-? have to check) tape. > Dlt tape, very good choice :) install pg_dump on the backup machine so you don't have to copy from machine A to machine B