Re: Postgres hardware configuration, clustering and - Mailing list pgsql-general
From | scott.marlowe |
---|---|
Subject | Re: Postgres hardware configuration, clustering and |
Date | |
Msg-id | Pine.LNX.4.33.0209120911150.10850-100000@css120.ihs.com Whole thread Raw |
In response to | Postgres hardware configuration, clustering and replication ("Peter Alberer" <h9351252@obelix.wu-wien.ac.at>) |
List | pgsql-general |
On Thu, 12 Sep 2002, Peter Alberer wrote: > Hi there, > > I have to make up a plan for the hardware configuration of a server that > is going to be used only for hosting postgresql. It will be part of a > web application and accessed by several web servers. The server should > therefore be as powerful as possible :-) Unfortunately I could not find > any documentation on the "ideal hardware configuration". That's because the "ideal hardware configuration" changes based on how many users and what type of queries you're going to run. Is all your data well indexed and normalized, and you use almost not CPU intensive functions? Then spend your money on RAM and hard drives. Is your data more fuzzy, and are you going to use a lot of high-CPU load functions (regex searches, likes, things like that?) then spend your money on RAM and CPU horse power. Are you gonna have one very high load user, and many small load users, (i.e. batch processing) then you don't need lots of CPUs or memory. Are you gonna have 100 writers and 100 readers? You'll likely need multiple CPUs for that. > So I am not quite sure about a few things: > > - should I go for one or more CPUs? Usually MP CPUs are somewhat > slower than the currently available CPUs for single-cpu-systems. I have a dual CPU machine for production and a single CPU machine for testing. The dually is an older PII 750 w/ 512 Meg ram, while the single CPU machine is a Celeron 1.1 (soon to be a 1.7) w/ 512 Meg ram. The dual PIII 750 feels faster, even though the numbers for it aren't that much better. > - I thought about using an external raid subsystem utilizing > raid5 (10x73GB 4MB Cache). Would it be faster to go for several raid10 > disk combinations, and moving some data files to these combinations? I have found that except for RAID-0, RAID 1+0, 0+1, 1, and 5 are all about equivalent under Linux running Postgresql. 1+0 and 0+1 are a little quicker than 1, and 1 is a little quicker than 5, but the speed of the drives and the interface they're on means way more than the configuration. I.e. a pair of 15krpm Ultra 160 drives in RAID1 will beat any combination of 4 7200 RPM 2 Gig Ultra 20 Narrow SCSI drives, be it RAID 5, 1+0 or 0+1, and even raid0 will be slower than the dual 15k U160 drives. > - Are there clustering solutions available for postgres? The best > configuration would be somehow extendable if performance increased. But > adding CPUs is not so easy usually. Adding a new server to a cluster > would probably be easier. Not really. > - I had a look at several load-balancing and replication apps. > (DBBalancer, PGreplicator, .) Some efforts seem to be discontinued. Did > anyone use some of these systems in a production environment? rserv works, but is a fairly limited synchronous design. Postgres-R is underway and showing great promise. There are a few other replication packages out there, and Postgresql inc. sells a hopped up version of rserv that's supposed to be quite speedy. We've NEVER had postgresql go down in production with tested code, and we've only brought it to it's knees a half dozen times in 3 years on our build system (things like unconstrained joins and gobble up memory pretty quick, DOSing the dbms until the kernel kills the offending backend.) So, my recommendations are to first give the machine LOTS of memory (2 gig is a good starting point). Use a 64 bit box if you need lots of memory, like a Sun Sparc box. Linux on Sparc is probably one of the better OS/hardware combos I've seen postgresql run on, but any 64 bit platform will offer more memory for postgresql to use than a 32 bit system. Solaris seems to be rather slow for postgresql to live on compared to Linux in all my testing. After memory, look at having two CPUs. You will likely never peg your CPUs if your database is properly indexed and normalized, but having the second CPU greatly increases responsiveness. After that, look at adding as many drives on as many interfaces as you can. I've found that any modern drive, whether it be SCSI or IDE is likely to be much faster than a drive that's a few years old. I.e. putting promise Ultra 133 cards in a machine and adding 8 80 Gig 7200 RPM IDE drives in a RAID 5 with a spare provides good speed, plently of storage, and good reliability. If your storage needs are modest (i.e. a small database that isn't gonna grow into a terabyte monster) then a pair of 15kRPM U160 SCSI drive each on their own channel is the fastest thing I've used for most postgresql servers. If you're gonna have HUNDREDS of readers and only one or two writers, then you may wanna look at running more than 2 drives in a mirror, something that linux kernel level raid drivers allow. This allows the OS to interleave read access across many drives. While writes are somewhat more expensive, the >2 drive mirror allows for many readers to run quite quickly. Hope that helps.
pgsql-general by date: