Home > mailing lists

Re: General data warehousing questions - Mailing list pgsql-general

From	Shane Ambler
Subject	Re: General data warehousing questions
Date	October 6, 2008 12:40:15
Msg-id	48EA314D.80609@Sheeky.Biz Whole thread Raw
In response to	Re: General data warehousing questions ("Scott Marlowe" <scott.marlowe@gmail.com>)
List	pgsql-general

Tree view

Scott Marlowe wrote:
> On Sun, Oct 5, 2008 at 7:48 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:
>> I am looking at the prospect of building a data warehouse of genomic
>> sequence data.  The machine that produces the data adds about
>> 300million rows per month in a central fact table and we will
>> generally want the data to be "online".  We don't need instantaneous
>> queries, but we would be using the data for data mining purposes and
>> running some "real-time" queries for reporting and research purposes.
>> I have had the pleasure of working on an Netezza box where this type
>> of thing is quite standard, but we don't have that access anymore, so
>> I'm looking for hints on using postgres in a data warehousing/mining
>> environment.  Any suggestions on how DDL, loading, backup, indexing,
>> or (to a certain extent) hardware?
>
> I assume you're familiar with stuff like star schemas.
>
> For loading you might want to look at things like pg_bulkloader, copy,
>
> For indexing remember that you have partial and fuctional indexes in
> postgresql and they can come in quite handy.
>
> For backup of large changing databases look into PITR.
>
> As for hardware, you need enough CPU horsepower and memory to handle
> however many users you're gonna have running simultaneous queries, but
> more important is usually the drive subsystem.  Throwing drives,
> battery backed cache and a good RAID controller can make a big
> difference.  Usual RAID-10 is preferred, as writes are much faster. If
> you're really squeezed for space and money then you can use RAID-5 but
> it has some seriously negative performance implications for parallel
> load handling and write speed.
>

You may also be interested in looking at BioPostgres
http://phenomics.cs.ucla.edu/



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

pgsql-general by date:

From: Markus Wanner
Date: 06 October 2008, 12:34:29
Subject: Re: [Pkg-postgresql-public] Postgres major version support policy on Debian

From: Alvaro Herrera
Date: 06 October 2008, 12:53:19
Subject: Re: Re: [Pkg-postgresql-public] Postgres major version support policy on Debian

Re: General data warehousing questions - Mailing list pgsql-general

Previous

Next