Data warehousing requirements - Mailing list pgsql-performance

From Gabriele Bartolini
Subject Data warehousing requirements
Date
Msg-id 6.1.2.0.2.20041006230239.0201bd40@box.tin.it
Whole thread Raw
List pgsql-performance
Hi guys,

    I just discussed about my problem on IRC. I am building a Web usage
mining system based on Linux, PostgreSQL and C++ made up of an OLTP
database which feeds several and multi-purpose data warehouses about users'
behaviour on HTTP servers.

    I modelled every warehouse using the star schema, with a fact table and
then 'n' dimension tables linked using a surrogate ID.

    Discussing with the guys of the chat, I came up with these conclusions,
regarding the warehouse's performance:

1) don't use referential integrity in the facts table
2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs
3) use an index for every dimension's ID in the fact table

    As far as administration is concerned: run VACUUM ANALYSE daily and
VACUUM FULL periodically.

    Is there anything else I should keep in mind?

    Also, I was looking for advice regarding hardware requirements for a
data warehouse system that needs to satisfy online queries. I have indeed
no idea at the moment. I can only predict 4 million about records a month
in the fact table, does it make sense or not? is it too much?

    Data needs to be easily backed up and eventually replicated.

    Having this in mind, what hardware architecture should I look for? How
many hard disks do I need, what kind and what RAID solution do you suggest
me to adopt (5 or 10 - I think)?

Thank you so much,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: sequential scan on select distinct
Next
From: Paul Ramsey
Date:
Subject: Re: The never ending quest for clarity on shared_buffers