Re: Reliability recommendations - Mailing list pgsql-performance

From Craig A. James
Subject Re: Reliability recommendations
Date
Msg-id 43F36288.3010302@modgraph-usa.com
Whole thread Raw
In response to Reliability recommendations  ("Jeremy Haile" <jhaile@fastmail.fm>)
Responses Re: Reliability recommendations
List pgsql-performance
Jeremy Haile wrote:
> We are a small company looking to put together the most cost effective
> solution for our production database environment.  Currently in
> production Postgres 8.1 is running on this machine:
>
> Dell 2850
> 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache
> 4 GB DDR2 400 Mhz
> 2 x 73 GB 10K SCSI RAID 1 (for xlog and OS)
> 4 x 146 GB 10K SCSI RAID 10 (for postgres data)
> Perc4ei controller
>
> ... I sent our scenario to our sales team at Dell and they came back with
> all manner of SAN, DAS, and configuration costing as much as $50k.

Given what you've told us, a $50K machine is not appropriate.

Instead, think about a simple system with several clones of the database and a load-balancing web server, even if one
machinecould handle your load.  If a machine goes down, the load balancer automatically switches to the other. 

Look at the MTBF figures of two hypothetical machines:

 Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average.
 Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on average.

Now go out and buy three of the $2,000 machines.  Use a load-balancer front end web server that can send requests
round-robinfashion to a "server farm".  Clone your database.  In fact, clone the load-balancer too so that all three
machineshave all software and databases installed.  Call these A, B, and C machines. 

At any given time, your Machine A is your web front end, serving requests to databases on A, B and C.  If B or C goes
down,no problem - the system keeps running.  If A goes down, you switch the IP address of B or C and make it your web
frontend, and you're back in business in a few minutes. 

Now compare the reliability -- in order for this system to be disabled, you'd have to have ALL THREE computers fail at
thesame time.  With the MTBF and repair time of two days, each machine has a 99.726% uptime.  The "MTBF", that is, the
expectedtime until all three machines will fail simultaneously, is well over 100,000 years!  Of course, this is silly,
machinesdon't last that long, but it illustrates the point:  Redundancy is beats reliability (which is why RAID is so
useful). 

All for $6,000.

Craig

pgsql-performance by date:

Previous
From: martial.bizel@free.fr
Date:
Subject: Re: out of memory
Next
From: Mark Lewis
Date:
Subject: Re: Reliability recommendations