Re: Postgres hardware configuration, clustering and - Mailing list pgsql-general

From scott.marlowe
Subject Re: Postgres hardware configuration, clustering and
Date
Msg-id Pine.LNX.4.33.0209120911150.10850-100000@css120.ihs.com
Whole thread Raw
In response to Postgres hardware configuration, clustering and replication  ("Peter Alberer" <h9351252@obelix.wu-wien.ac.at>)
List pgsql-general
On Thu, 12 Sep 2002, Peter Alberer wrote:

> Hi there,
>
> I have to make up a plan for the hardware configuration of a server that
> is going to be used only for hosting postgresql. It will be part of a
> web application and accessed by several web servers. The server should
> therefore be as powerful as possible :-) Unfortunately I could not find
> any documentation on the "ideal hardware configuration".

That's because the "ideal hardware configuration" changes based on how
many users and what type of queries you're going to run.  Is all your data
well indexed and normalized, and you use almost not CPU intensive
functions?  Then spend your money on RAM and hard drives.  Is your data
more fuzzy, and are you going to use a lot of high-CPU load functions
(regex searches, likes, things like that?) then spend your money on RAM
and CPU horse power.  Are you gonna have one very high load user, and many
small load users, (i.e. batch processing) then you don't need lots of CPUs
or memory.  Are you gonna have 100 writers and 100 readers?  You'll likely
need multiple CPUs for that.

> So I am not quite sure about a few things:
>
> -        should I go for one or more CPUs? Usually MP CPUs are somewhat
> slower than the currently available CPUs for single-cpu-systems.

I have a dual CPU machine for production and a single CPU machine for
testing.  The dually is an older PII 750 w/ 512 Meg ram, while the single
CPU machine is a Celeron 1.1 (soon to be a 1.7) w/ 512 Meg ram.  The dual
PIII 750 feels faster, even though the numbers for it aren't that much
better.

> -        I thought about using an external raid subsystem utilizing
> raid5 (10x73GB 4MB Cache). Would it be faster to go for several raid10
> disk combinations, and moving some data files to these combinations?

I have found that except for RAID-0, RAID 1+0, 0+1, 1, and 5 are all about
equivalent under Linux running Postgresql.  1+0 and 0+1 are a little
quicker than 1, and 1 is a little quicker than 5, but the speed of the
drives and the interface they're on means way more than the configuration.
I.e. a pair of 15krpm Ultra 160 drives in RAID1 will beat any combination
of 4 7200 RPM 2 Gig Ultra 20 Narrow SCSI drives, be it RAID 5, 1+0 or 0+1,
and even raid0 will be slower than the dual 15k U160 drives.

> -        Are there clustering solutions available for postgres? The best
> configuration would be somehow extendable if performance increased. But
> adding CPUs is not so easy usually. Adding a new server to a cluster
> would probably be easier.

Not really.

> -        I had a look at several load-balancing and replication apps.
> (DBBalancer, PGreplicator, .) Some efforts seem to be discontinued. Did
> anyone use some of these systems in a production environment?

rserv works, but is a fairly limited synchronous design.  Postgres-R is
underway and showing great promise.  There are a few other replication
packages out there, and Postgresql inc. sells a hopped up version of rserv
that's supposed to be quite speedy.

We've NEVER had postgresql go down in production with tested code, and
we've only brought it to it's knees a half dozen times in 3 years on our
build system (things like unconstrained joins and gobble up memory pretty
quick, DOSing the dbms until the kernel kills the offending backend.)

So, my recommendations are to first give the machine LOTS of memory (2 gig
is a good starting point).  Use a 64 bit box if you need lots of memory,
like a Sun Sparc box.  Linux on Sparc is probably one of the better
OS/hardware combos I've seen postgresql run on, but any 64 bit platform
will offer more memory for postgresql to use than a 32 bit system.
Solaris seems to be rather slow for postgresql to live on compared to
Linux in all my testing.

After memory, look at having two CPUs.  You will likely never peg your
CPUs if your database is properly indexed and normalized, but having the
second CPU greatly increases responsiveness.

After that, look at adding as many drives on as many interfaces as you
can.  I've found that any modern drive, whether it be SCSI or IDE is
likely to be much faster than a drive that's a few years old.  I.e.
putting promise Ultra 133 cards in a machine and adding 8 80 Gig 7200 RPM
IDE drives in a RAID 5 with a spare provides good speed, plently of
storage, and good reliability.  If your storage needs are modest (i.e. a
small database that isn't gonna grow into a terabyte monster) then a pair
of 15kRPM U160 SCSI drive each on their own channel is the fastest thing
I've used for most postgresql servers.

If you're gonna have HUNDREDS of readers and only one or two writers, then
you may wanna look at running more than 2 drives in a mirror, something
that linux kernel level raid drivers allow.  This allows the OS to
interleave read access across many drives.  While writes are somewhat more
expensive, the >2 drive mirror allows for many readers to run quite
quickly.

Hope that helps.


pgsql-general by date:

Previous
From: "Peter Alberer"
Date:
Subject: Postgres hardware configuration, clustering and replication
Next
From: "Roderick A. Anderson"
Date:
Subject: Re: max | last INET in table