I am looking at the prospect of building a data warehouse of genomic
sequence data. The machine that produces the data adds about
300million rows per month in a central fact table and we will
generally want the data to be "online". We don't need instantaneous
queries, but we would be using the data for data mining purposes and
running some "real-time" queries for reporting and research purposes.
I have had the pleasure of working on an Netezza box where this type
of thing is quite standard, but we don't have that access anymore, so
I'm looking for hints on using postgres in a data warehousing/mining
environment. Any suggestions on how DDL, loading, backup, indexing,
or (to a certain extent) hardware?
Thanks,
Sean