Re: choosing the right platform - Mailing list pgsql-performance

From scott.marlowe
Subject Re: choosing the right platform
Date
Msg-id Pine.LNX.4.33.0304091042320.22107-100000@css120.ihs.com
Whole thread Raw
In response to choosing the right platform  ("Matthew Nuzum" <cobalt@bearfruit.org>)
Responses Re: choosing the right platform
List pgsql-performance
I would say up front that both Linux and BSD are probably your two best
choices.  If you're familiar with one more than the other, that
familiarity may be more important than the underlying differences in the
two OSes, as they are both good platforms to run postgresql on top of.

Secondly, look carefully at using persistant connections in large numbers.

While persistant connections DO represent a big savings in connect time,
the savings are lost in the noise of many PHP applications.

i.e. my dual PIII swiss army knife server can initiate single persistant
connections at 1,000,000 a second (reusing the old ones of course).
non-persistant connects happen at 1,000 times a second.  Most of my
scripts run in 1/10th of a second or so, so the 1/1000th used to connect
is noise to me.

If you are going to use persistant connections, it might work better to
let apache have only 20 or 40 children, which will force the apache
children to "round robin" serve the requests coming in.

This will usually work fine, since keeping the number of apache children
down keeps the number of postgresql backends down, which keeps the system
faster in terms of response time.  Turn keep alive down to something short
like 10 seconds, or just turn it off, as keep alive doesn't really save
all that much time in apache.

Note that machine testing with 100 simo connections doesn't translate
directly to 100 users.  Generally, x simos usually represents about 10 to
20 x users, since users don't click buttons all that fast.  so an apache
configured by 40 max children should handle 100 to 200 users with no
problem.

On Tue, 8 Apr 2003, Matthew Nuzum wrote:

> Hello all,
>
> I've been the lead developer of a successful (so-far) web application.
> Currently we run the database and the web application on the same server,
> but it will soon be necessary to split them and add some more web servers.
>
> I'm not counting on using replication any time soon, so I want to choose the
> hardware platform that will meet my needs for some time.  Maybe you can give
> some suggestions...
>
> My application is written in PHP and relies on the apache web server.  We
> use persistent db connections, so it appears (correct me if I'm wrong) that
> every apache child process gets one connection to the database.  If my
> MaxClients is 150, each web server will be able to make up to 150 db
> connections.  I'd like to play with that number a little bit once I get the
> webserver off of the db server.  I feel that I could handle a greater number
> of Clients, so let us say that I have up to 200 connections per server.
>
> I'd like to have room to grow, so let's also say that I go to 20 web servers
> for a total of 4000 connections.  (I'd probably like to go even higher, so
> consider this our starting point)
>
> With a couple dozen active accounts and a lot of test data, my current
> database is equiv to about 100 active accounts.  Its current disk space
> consumption is:
> data # du --max-depth=2
> 3656    ./base/1
> 3656    ./base/16975
> 4292    ./base/95378
> 177824  ./base/200371
> 189432  ./base
> 144     ./global
> 82024   ./pg_xlog
> 2192    ./pg_clog
> 273836  .
>
> This is not optimized and there is a lot of old data, but to be safe, maybe
> we should assume that each account uses 4 MB of disk space in the db,
> counting indexes, tables and etc.  I'd like to scale to 15,000 - 25,000
> accounts, but I doubt that will be feasible at my budget level.  (Also,
> there is a lot of optimizing to do, so it won't surprise me if this 4MB
> number is more like 2MB or even less)
>
> I'm not as concerned with disk subsystem or layout at the moment.  I've seen
> a lot of good documentation (especially from Bruce Momjian, thanks!) on this
> subject.  I'm mostly concerned with choosing the platform that's going to
> allow the scalability I need.
>
> Currently I'm most experienced in Linux, especially RedHat.  I'm "certified"
> on SCO Openserver (5.x) and I've played with Irix, OSF/1 (I don't think it's
> called that anymore), Free BSD (3.x) and Solaris (2.x).  I'm most
> comfortable with Linux, but I'm willing to use a different platform if it
> will be beneficial.  I've heard that Solaris on the Sparc platform is
> capable of addressing larger amounts of RAM than Linux on Intel does.  I
> don't know if that's true or if that has bearing, but I'd like to hear your
> opinions.
>
> My budget is going to be between (US) $5,000 and $10,000 and I'd like to
> stay under $7,000.  I'm a major bargain hunter, so I shop e-bay a lot and
> here are some samplings that I think may be relevant for discussion:
>
> SUN (I'm not an expert in this, advice is requested)
> ----------------------------------------------------
> SUN ENTERPRISE 4500 8x400 Mhz 4MB Cache CPUs 8GB RAM no hard drives ~$6,000
> Sun E3500 - 8 x 336MHz 4MB Cache CPUs 4GB RAM 8 x 9.1GB FC disks ~$600.00
> Any other suggestions?
>
> INTEL (I'm much more familiar with this area)
> ----------------------------------------------------
> Compaq DL580 4x700 MHz 2MB Cache CPUs 4GB RAM (16GB Max) HW Raid w/ 64MB
> Cache ~$6000
> IBM Netfinity 7100 4x500 MHz 1MB Cache CPUs up to (16GB Max) HW Raid
> Dell PowerEdge 8450 8x550 2M Cache CPUS 4GB (32GB Max) HS RAID w/ 16MB Cache
> ~$4,500
> Any other suggestions?
>
> Any other hardware platforms I should consider?
>
> Finally, and I know this sounds silly, but I don't have my own data center,
> so size is something I need to take into consideration.  I pay for data
> center space by the physical size of my servers.  My priorities are
> Performance, Reasonable amount of scalability (as outlined above) and
> finally physical size.
>
> Thanks for taking the time to read this and for any assistance you can give,
>
> Matthew Nuzum
> www.bearfruit.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: choosing the right platform
Next
From: "Matthew Nuzum"
Date:
Subject: Re: choosing the right platform