Thread: Ideal Hardware?
We have an opportunity to purchase a new, top-notch database server. I am wondering what kind of hardware is recommended? We're on Linux platforms and kernels though. I remember a comment from Tom about how he was spending a lot of time debugging problems which turned out to be hardware-related. I of course would like to avoid that. In terms of numbers, we expect have an average of 100 active connections (most of which are idle 9/10ths of the time), with about 85% reading traffic. I expect the database with flow average 10-20kBps under moderate load. I hope to have one server host about 1000-2000 active databases, with the largest being about 60 meg (no blobs). Inactive databases will only be for reading (archival) purposes, and will seldom be accessed. Does any of this represent a problem for Postgres? The datasets are typically not that large, only a few queries on a few databases ever return over 1000 rows. I'm worried about being able to handle the times when there will be spikes in the traffic. The configuration that is going on in my head is: RAID 1, 200gig 1 server, 4g ram Linux 2.6 I was also wondering about storage units (IBM FAStT200) with giga-bit Ethernet to rack mount computer(s)... But would I need more than 1 CPU? If I did, how would I handle the file system? We only do a few joins, so I think most of it would be I/O latency. Thanks! Jason Hihn Paytime Payroll
Jason, Your question is really suited to the PERFORMANCE list, not NOVICE, so I have cross-posted it there. I reccomend that you subscribe to performance, and drop novice from your replies. There are lots of hardware geeks on performance, but few on novice. > We have an opportunity to purchase a new, top-notch database server. I am > wondering what kind of hardware is recommended? We're on Linux platforms > and kernels though. I remember a comment from Tom about how he was spending > a lot of time debugging problems which turned out to be hardware-related. I > of course would like to avoid that. > > In terms of numbers, we expect have an average of 100 active connections > (most of which are idle 9/10ths of the time), with about 85% reading > traffic. I expect the database with flow average 10-20kBps under moderate > load. I hope to have one server host about 1000-2000 active databases, with > the largest being about 60 meg (no blobs). Inactive databases will only be > for reading (archival) purposes, and will seldom be accessed. Is that 100 concurrent connections *total*, or per-database? If the connections are idle 90% of the time, then are they open, or do they get re-established with each query? Have you considered connection pooling for the read-only queries? > Does any of this represent a problem for Postgres? The datasets are > typically not that large, only a few queries on a few databases ever return > over 1000 rows. I'm worried about being able to handle the times when there > will be spikes in the traffic. It's all possible, it just requires careful application design and lots of hardware. You should also cost things out; sometimes it's cheaper to have several good servers instead of one uber-server. The latter also helps with hardware replacement. > The configuration that is going on in my head is: > RAID 1, 200gig RAID 1+0 can be good for Postgres. However, if you have a budget, RAID 5 with 6 or more disks can be better some of the time, particularly when read queries are the vast majority of the load. There are, as yet, no difinitive statistics, but OSDL is working on it! More important than the RAID config is the RAID card; once again, with money, multi-channel RAID cards with a battery-backed write cache are your best bet; some cards even allow you to span RAID1 between cards of the same model. See the discussion about LSI MegaRaid in the PERFORMANCE list archives over the last 2 weeks. > 1 server, 4g ram > Linux 2.6 You're very brave. Me, I'm not adopting 2.6 in production until 2.6.03 is out, at least. > I was also wondering about storage units (IBM FAStT200) with giga-bit > Ethernet to rack mount computer(s)... But would I need more than 1 CPU? If > I did, how would I handle the file system? We only do a few joins, so I > think most of it would be I/O latency. PostgreSQL will make use of multiple processors. If you are worried about peak time loads, having 2-4 processors to distribute queries across would be very useful. Also, I'm concerned about the "we only do a few joins". What that says to me is "we don't really know how to write complex queries, so we pull a lot of redundant data." Simple queries can be far less efficient than complex ones if they result in you pulling entire tables across to the client. -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, 1 Oct 2003, Jason Hihn wrote: > We have an opportunity to purchase a new, top-notch database server. I am > wondering what kind of hardware is recommended? We're on Linux platforms and > kernels though. [...] > The configuration that is going on in my head is: > RAID 1, 200gig > 1 server, 4g ram > Linux 2.6 I vaguely remember someone (Tom?) mentioning that one of the log files probably might want to go on its own partition. Sometime in the last 2 weeks. I am not pushing dbase stuff here, but my system is about your size. About 120 GB of my disk is RAID on a promiseware card, using the kernel software RAID (apparently software RAID on Linux is faster than the promisecard does it in hardware). I have a bunch of different things using software RAID: /tmp is a RAID 0 with ext2 /home is a RAID 5 with ext3 /usr, /var, /usr/local is RAID 10 with ext3 /var/lib/postgres is on a real SCSI 10k, on ext3 with noatime So, my postgres isn't on the RAID(s). I just got finished rebuilding my RAIDs for the second time (failed disk). I ended up rebuilding things in single user mode, so I can't set tasks in parallel. I don't know if you can do this in multi-user mode and/or in parallel. I'm being paranoid. Rebuilding RAID 5 is fast, rebuilding RAID 1 is a pain in the butt! My biggest RAID 10 is about 10 GB, bundling the new partition from the new disk into the RAID 0 is fast, rebuilding the mirror (RAID 1 part) takes 10 hours! Dual athlon 1.6's and 1 GB of RAM, so I have lots of horsepower. Maybe you are going with better RAID than I have, but it seems to me that RAID 5 (with spares) is going to be better if you ever have to rebuild. Gord
Gord, > I vaguely remember someone (Tom?) mentioning that one of the log > files probably might want to go on its own partition. That's general knowledge, but not really applicable to a fast RAID system. It's more imporant to regular-disk systems; with 4+ disk RAID, nobody has been able to demonstrate a gain from having the disk separation. > fast, rebuilding RAID 1 is a pain in the butt! My biggest RAID 10 > is about 10 GB, bundling the new partition from the new disk into > the RAID 0 is fast, rebuilding the mirror (RAID 1 part) takes 10 > hours! Dual athlon 1.6's and 1 GB of RAM, so I have lots of > horsepower. Maybe you are going with better RAID than I have, > but it seems to me that RAID 5 (with spares) is going to be better > if you ever have to rebuild. Also depends on the number of disks, the controller, and the balance of read vs. write activity. I've found RAID 5 with no cache to be dog-slow for OLTP (heavy write transaction) databases, and use RAID 1 for that. -- -Josh Berkus Aglio Database Solutions San Francisco
On Wed, 2003-10-01 at 10:13, Jason Hihn wrote: > We have an opportunity to purchase a new, top-notch database server. I am > wondering what kind of hardware is recommended? We're on Linux platforms and > kernels though. I remember a comment from Tom about how he was spending a > lot of time debugging problems which turned out to be hardware-related. I of > course would like to avoid that. > > In terms of numbers, we expect have an average of 100 active connections > (most of which are idle 9/10ths of the time), with about 85% reading > traffic. I expect the database with flow average 10-20kBps under moderate > load. I hope to have one server host about 1000-2000 active databases, with > the largest being about 60 meg (no blobs). Inactive databases will only be > for reading (archival) purposes, and will seldom be accessed. Whoever mentioned using multiple servers instead of one uber-server is very right. You're putting all your eggs in one basket that way, and unless that "basket" has hot-swap CPUs, memory boards, etc, etc, then if you have a hardware problem, your whole business goes down. Buy 3 or 4 smaller systems, and distribute any possible pain from down time. It seems like I'm going to contravene what I just said about eggs in a basket when I suggest that the disks could possibly be concen- trated into a NAS, so that you could get 1 big, honkin fast *hot- swappable* (dual-redundant U320 storage controllers w/ 512MB battery- backed cache each, for a total of 1GB cache are easily available) disk subsystem for however many smaller CPU-boxes you get. (They could be kept un-shared by making separate partitions, and each machine only mounts one partition.) -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Adventure is a sign of incompetence" Stephanson, great polar explorer