To all,
We are building a data warehouse composed of essentially click stream
data. The DB is growing fairly quickly as to be expected, currently at
90GB for one months data. The idea is to keep 6 months detailed data on
line and then start aggregating older data to summary tables. We have 2
fact tables currently, one with about 68 million rows and the other with
about 210 million rows. Numerous dimension tables ranging from a dozen
rows to millions.
We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec
PERC3/Di, configuration. I believe they are 10k drives. Files system
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem
turned on. This box is used only for the warehouse. All the ETL work
is done on this machine as well. DB version is postgreSQL 7.4.
We are running into issues with IO saturation obviously. Since this
thing is only going to get bigger we are looking for some advice on how
to accommodate DB's of this size.
First question is do we gain anything by moving the RH Enterprise
version of Linux in terms of performance, mainly in the IO realm as we
are not CPU bound at all? Second and more radical, has anyone run
postgreSQL on the new Apple G5 with an XRaid system? This seems like a
great value combination. Fast CPU, wide bus, Fibre Channel IO, 2.5TB
all for ~17k.
I keep see references to terabyte postgreSQL installations, I was
wondering if anyone on this list is in charge of one of those and can
offer some advice on how to position ourselves hardware wise.
Thanks.
--sean