Physical sites handling large data - Mailing list pgsql-general

From scott.marlowe
Subject Physical sites handling large data
Date
Msg-id Pine.LNX.4.33.0209131528250.21251-100000@css120.ihs.com
Whole thread Raw
List pgsql-general
I moved this over to general, where it's more on topic...

On Fri, 13 Sep 2002, Shridhar Daithankar wrote:

> Hi all,
>
> One of my friends is evaluating postgres for large databases. This is a select
> intensive application which is something similar to data-warehousing as far as
> I can see.
>
> The data is 150GB in flat files so would swell to 200GB+ with indexes.
>
> Is anybody running that kind of site? Any url? Any performance numbers/tuning
> tips for random selects?
>
> I would hate to put mysql there but we are evaluating that too. I would hate if
> postgres loses this to mysql because I didn't know few things about postgres.
>
> Secondly would it make a difference if I host that database on say, an HP-UX
> box? From some tests I have done for my job, single CPU HP-UX box trounces 4
> way xeon box. Any suggestions in this directions?

Often times the real limiter for database performance is IO bandwidth and
subsystem, not the CPUs.  After that memory access speed and bandwidth are
very important too, so I can see a big HP UX box beating the pants off of
a Xeon.

Honestly, I'd put a dual 1G PIII 1G ram up against a quad xeon with 2
Gig ram if I got to spend the difference in cost on a very fast RAID
array for the PIII.  Since a quad Xeon with 2 Gigs ram and a pair of 18
gig SCSI drives goes for ~ $27,500 on Dell, and a Dual PIII 1Ghz with 5
15KRPM 18 gig drives goes for ~ $6,700, that leaves me with about $20,000
to spend on an external RAID array on top of the 5 15kRPM drives I've
already got configured.  An external RAID array with 144GB of 15krpm 18gig
drives runs ~$7700, so you could get three if you got the dual PIII
without all those drives built into it.  That makes for 24 15kRPM drives
and about 430 Gigs of storage, all in a four unit Rack mounted setup.

My point being, spend more money on the drive subsystem than anything else
and you'll probably be fine, but postgresql may or may not be your best
answer.  It may be better to use something like berkeley db to handle this
job than a SQL database.


pgsql-general by date:

Previous
From: "Orr, Steve"
Date:
Subject: PostgreSQL CLOB Support
Next
From: Jeff Davis
Date:
Subject: Re: Panic - Format has changed