Re: Hardware recommendations to scale to silly load - Mailing list pgsql-performance

From scott.marlowe
Subject Re: Hardware recommendations to scale to silly load
Date
Msg-id Pine.LNX.4.33.0308270909500.1338-100000@css120.ihs.com
Whole thread Raw
In response to Hardware recommendations to scale to silly load  (matt <matt@ymogen.net>)
List pgsql-performance
On 27 Aug 2003, matt wrote:

> I'm wondering if the good people out there could perhaps give me some
> pointers on suitable hardware to solve an upcoming performance issue.
> I've never really dealt with these kinds of loads before, so any
> experience you guys have would be invaluable.  Apologies in advance for
> the amount of info below...
>
> My app is likely to come under some serious load in the next 6 months,
> but the increase will be broadly predictable, so there is time to throw
> hardware at the problem.
>
> Currently I have a ~1GB DB, with the largest (and most commonly accessed
> and updated) two tables having 150,000 and 50,000 rows.
>
> A typical user interaction with the system involves about 15
> single-table selects, 5 selects with joins or subqueries, 3 inserts, and
> 3 updates.  The current hardware probably (based on benchmarking and
> profiling) tops out at about 300 inserts/updates *or* 2500 selects per
> second.
>
> There are multiple indexes on each table that updates & inserts happen
> on.  These indexes are necessary to provide adequate select performance.
>
> Current hardware/software:
> Quad 700MHz PIII Xeon/1MB cache
> 3GB RAM
> RAID 10 over 4 18GB/10,000rpm drives
> 128MB battery backed controller cache with write-back enabled
> Redhat 7.3, kernel 2.4.20
> Postgres 7.2.3 (stock redhat issue)
>
> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000
> inserts/updates or 25,000 selects per second, over a 25GB database with
> most used tables of 5,000,000 and 1,000,000 rows.

It will likely take a combination of optimizing your database structure /
methods and increasing your hardware / OS performance.

You probably, more than anything, should look at some kind of
superfast, external storage array that has dozens of drives, and a large
battery backed cache.  You may be able to approximate this yourself with
just a few dual channel Ultra 320 SCSI cards and a couple dozen hard
drives.  The more spindles you throw at a database, generally speaking,
the more parallel load it can handle.

You may find that once you get to 10 or 20 drives, RAID 5 or 5+0 or 0+5
will be outrunning 1+0/0+1 due to fewer writes.

You likely want to look at the fastest CPUs with the fastest memory you
can afford.  those 700MHz xeons are likely using PC133 memory, which is
painfully slow compared to the stuff pumping data out at 4 to 8 times the
rate of the older stuff.

Maybe an SGI Altix could do this?  Have you looked at them?  They're not
cheap, but they do look to be quite fast, and can scale to 64 CPUs if need
be.  They're interbox communication fabric is faster than most CPU's front
side busses.

> Notably, the data is very time-sensitive, so the active dataset at any
> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).
>
> Also, and importantly, the load comes but one hour per week, so buying a
> Starfire isn't a real option, as it'd just sit idle the rest of the
> time.  I'm particularly interested in keeping the cost down, as I'm a
> shareholder in the company!

Interesting.  If you can't spread the load out, can you batch some parts
of it?  Or is the whole thing interactive therefore needing to all be
done in real time at once?

> So what do I need?

whether you like it or not, you're gonna need heavy iron if you need to do
this all in one hour once a week.

> Can anyone who has (or has ever had) that kind of
> load in production offer any pointers, anecdotes, etc?  Any theoretical
> musings also more than welcome.  Comments upon my sanity will be
> referred to my doctor.
>
> If the best price/performance option is a second hand 32-cpu Alpha
> running VMS I'd be happy to go that way ;-)

Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I
saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going
for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should
fly.


pgsql-performance by date:

Previous
From: Jeff
Date:
Subject: Re: Sun vs a P2. Interesting results.
Next
From: matt
Date:
Subject: Re: Hardware recommendations to scale to silly load