Re: Database storage - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Database storage
Date
Msg-id dcc563d10907091728l216ef97ck645b328d0ba06d9a@mail.gmail.com
Whole thread Raw
In response to Database storage  (nabble.30.miller_2555@spamgourmet.com)
Responses Re: Database storage
List pgsql-general
On Thu, Jul 9, 2009 at 5:40 PM, <nabble.30.miller_2555@spamgourmet.com> wrote:
> Hi -
>
> I'm running Postgres v8.3 on Fedora 10 (home machine - x86_64
> architecture). After realizing that the storage requirements of one of
> my databases will exceed 4Tb, I wanted to see if anyone had any
> suggestions as to hardware setup that works well with Postgres running
> on Linux. I have partitioned most of the database, so older data can
> go on slower devices with only 400-500Gb needed on faster devices.
> Redundancy is a requirement. General database usage is generally low,
> with burst input. For the curious, here's the general profile:
>    a) ~19,000,000 INSERT statements/ day (190 INSERTs/ commit blocks)
> for a total of ~10 Gb INSERT/day
>    b) INSERTs occur sequentially (daily batch)
>    c) Handful of SELECT statements run per day to generate a few
> dozen reports, but these are not resource-intensive.
>    d) Only 3-5 database users configured.
>
> Initial thought: Use an external multi-bay eSATA case and inserting
> some 7200RPM+, 32Mb cache 1T+ SATA drives running RAID 5.

While RAID-5 is generally not the best database setup, considering
that your inputs will be bulk you're probably ok.  Also, given that
2TB server class drives (5400rpm) are now out, and that they have
about the same throughput as 1TB 7200 RPM server class drives, you
might want to look into them as you can get by on fewer drives for the
same storage.

> A few specific questions:
>    1) Which components generally lead to bottlenecks (controller
> card, individual HD cache, HD rotational speed, other???)

IO is almost always your bottleneck on queries over large data sets.

>    2) Are there better solutions (broad question in terms of price/
> data availability)? Budget: $500 - $750 for the storage medium. Better
> question: Which setup was worked well with a Postgres implementation
> running on a similar budget?

$750 is about what a decent RAID controller would cost you, but again
it's likely that given your bulk import scenario,  you're probably ok
without one.  In this instance, you're probably best off with software
RAID than a cheap RAID card which will cost extra and probably be
slower than linux software RAID.

pgsql-general by date:

Previous
From: nabble.30.miller_2555@spamgourmet.com
Date:
Subject: Database storage
Next
From: Greg Stark
Date:
Subject: Re: SELECT DISTINCT very slow