choosing the right platform - Mailing list pgsql-performance

From Matthew Nuzum
Subject choosing the right platform
Date
Msg-id 003101c2fe49$86a9fc30$6900a8c0@mattspc
Whole thread Raw
Responses Re: choosing the right platform  (Josh Berkus <josh@agliodbs.com>)
Re: choosing the right platform  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-performance
Hello all,

I've been the lead developer of a successful (so-far) web application.
Currently we run the database and the web application on the same server,
but it will soon be necessary to split them and add some more web servers.

I'm not counting on using replication any time soon, so I want to choose the
hardware platform that will meet my needs for some time.  Maybe you can give
some suggestions...

My application is written in PHP and relies on the apache web server.  We
use persistent db connections, so it appears (correct me if I'm wrong) that
every apache child process gets one connection to the database.  If my
MaxClients is 150, each web server will be able to make up to 150 db
connections.  I'd like to play with that number a little bit once I get the
webserver off of the db server.  I feel that I could handle a greater number
of Clients, so let us say that I have up to 200 connections per server.

I'd like to have room to grow, so let's also say that I go to 20 web servers
for a total of 4000 connections.  (I'd probably like to go even higher, so
consider this our starting point)

With a couple dozen active accounts and a lot of test data, my current
database is equiv to about 100 active accounts.  Its current disk space
consumption is:
data # du --max-depth=2
3656    ./base/1
3656    ./base/16975
4292    ./base/95378
177824  ./base/200371
189432  ./base
144     ./global
82024   ./pg_xlog
2192    ./pg_clog
273836  .

This is not optimized and there is a lot of old data, but to be safe, maybe
we should assume that each account uses 4 MB of disk space in the db,
counting indexes, tables and etc.  I'd like to scale to 15,000 - 25,000
accounts, but I doubt that will be feasible at my budget level.  (Also,
there is a lot of optimizing to do, so it won't surprise me if this 4MB
number is more like 2MB or even less)

I'm not as concerned with disk subsystem or layout at the moment.  I've seen
a lot of good documentation (especially from Bruce Momjian, thanks!) on this
subject.  I'm mostly concerned with choosing the platform that's going to
allow the scalability I need.

Currently I'm most experienced in Linux, especially RedHat.  I'm "certified"
on SCO Openserver (5.x) and I've played with Irix, OSF/1 (I don't think it's
called that anymore), Free BSD (3.x) and Solaris (2.x).  I'm most
comfortable with Linux, but I'm willing to use a different platform if it
will be beneficial.  I've heard that Solaris on the Sparc platform is
capable of addressing larger amounts of RAM than Linux on Intel does.  I
don't know if that's true or if that has bearing, but I'd like to hear your
opinions.

My budget is going to be between (US) $5,000 and $10,000 and I'd like to
stay under $7,000.  I'm a major bargain hunter, so I shop e-bay a lot and
here are some samplings that I think may be relevant for discussion:

SUN (I'm not an expert in this, advice is requested)
----------------------------------------------------
SUN ENTERPRISE 4500 8x400 Mhz 4MB Cache CPUs 8GB RAM no hard drives ~$6,000
Sun E3500 - 8 x 336MHz 4MB Cache CPUs 4GB RAM 8 x 9.1GB FC disks ~$600.00
Any other suggestions?

INTEL (I'm much more familiar with this area)
----------------------------------------------------
Compaq DL580 4x700 MHz 2MB Cache CPUs 4GB RAM (16GB Max) HW Raid w/ 64MB
Cache ~$6000
IBM Netfinity 7100 4x500 MHz 1MB Cache CPUs up to (16GB Max) HW Raid
Dell PowerEdge 8450 8x550 2M Cache CPUS 4GB (32GB Max) HS RAID w/ 16MB Cache
~$4,500
Any other suggestions?

Any other hardware platforms I should consider?

Finally, and I know this sounds silly, but I don't have my own data center,
so size is something I need to take into consideration.  I pay for data
center space by the physical size of my servers.  My priorities are
Performance, Reasonable amount of scalability (as outlined above) and
finally physical size.

Thanks for taking the time to read this and for any assistance you can give,

Matthew Nuzum
www.bearfruit.org


pgsql-performance by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: [GENERAL] Yet Another (Simple) Case of Index not used
Next
From: "Denis @ Next2Me"
Date:
Subject: Re: [GENERAL] Yet Another (Simple) Case of Index not used