Re: General data warehousing questions - Mailing list pgsql-general

From Shane Ambler
Subject Re: General data warehousing questions
Date
Msg-id 48EA314D.80609@Sheeky.Biz
Whole thread Raw
In response to Re: General data warehousing questions  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
Scott Marlowe wrote:
> On Sun, Oct 5, 2008 at 7:48 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:
>> I am looking at the prospect of building a data warehouse of genomic
>> sequence data.  The machine that produces the data adds about
>> 300million rows per month in a central fact table and we will
>> generally want the data to be "online".  We don't need instantaneous
>> queries, but we would be using the data for data mining purposes and
>> running some "real-time" queries for reporting and research purposes.
>> I have had the pleasure of working on an Netezza box where this type
>> of thing is quite standard, but we don't have that access anymore, so
>> I'm looking for hints on using postgres in a data warehousing/mining
>> environment.  Any suggestions on how DDL, loading, backup, indexing,
>> or (to a certain extent) hardware?
>
> I assume you're familiar with stuff like star schemas.
>
> For loading you might want to look at things like pg_bulkloader, copy,
>
> For indexing remember that you have partial and fuctional indexes in
> postgresql and they can come in quite handy.
>
> For backup of large changing databases look into PITR.
>
> As for hardware, you need enough CPU horsepower and memory to handle
> however many users you're gonna have running simultaneous queries, but
> more important is usually the drive subsystem.  Throwing drives,
> battery backed cache and a good RAID controller can make a big
> difference.  Usual RAID-10 is preferred, as writes are much faster. If
> you're really squeezed for space and money then you can use RAID-5 but
> it has some seriously negative performance implications for parallel
> load handling and write speed.
>

You may also be interested in looking at BioPostgres
http://phenomics.cs.ucla.edu/



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

pgsql-general by date:

Previous
From: Markus Wanner
Date:
Subject: Re: [Pkg-postgresql-public] Postgres major version support policy on Debian
Next
From: Alvaro Herrera
Date:
Subject: Re: Re: [Pkg-postgresql-public] Postgres major version support policy on Debian