Thread: Hardware advice
Hi, I am in the process of pricing up boxes for our database, and I was wondering if anyone had any recommendations or comments. The database itself will have around 100-150 users mostly accessing through a PHP/apache interface. I don't expect lots of simultaneous activity, however users will often be doing multiple table joins (can be up to 10-15 tables in one query). Also they will often be pulling out on the order of 250,000 rows (5-10 numeric fields per row), processing the data (I may split this to a second box) and then writing back ~20,000 rows of data (2-3 numeric fields per row). Estimating total amount of data is quite tricky, but it could grow to 100-250Gb over the next 3 years. I have priced one box from the Dell web site as follows Single Intel Xeon 2.8GHz with 512kb L2 cache 2GB RAM 36Gb 10,000rpm Ultra 3 160 SCSI 36Gb 10,000rpm Ultra 3 160 SCSI 146Gb 10,000rpm U320 SCSI 146Gb 10,000rpm U320 SCSI 146Gb 10,000rpm U320 SCSI PERC 3/DC RAID Controller (128MB Cache) RAID1 for 2x 36Gb drives RAID5 for 3x 146Gb drives Running RedHat Linux 8.0 This configuration would be pretty much the top of our budget (~ £5k). I was planning on having the RAID1 setup for the OS and then the RAID5 for the db files. Would it be better to have a dual 2.4GHz setup rather than a single 2.8GHz or would it not make much difference? Does the RAID setup look ok, or would anyone forsee problems in this context? (This machine can take a maximum of 5 internal drives). Am I overdoing any particular component at the expense of another? Any other comments would be most welcome. Thanks for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote: > RAID5 for 3x 146Gb drives I find the RAID5 on the PERC to be painfully slow. It's _really_ bad if you don't put WAL on its own drive. Also, you don't mention it, but check to make sure you're getting ECC memory on these boxes. Random memory errors which go undetected will make you very unhappy. ECC lowers (but doesn't eliminate, apparently) your chances. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Fri, 30 May 2003, Adam Witney wrote: > 250,000 rows (5-10 numeric fields per row), processing the data (I may split > this to a second box) and then writing back ~20,000 rows of data (2-3 > numeric fields per row). Make sure and vacuum often and crank up your fsm values to be able to reclaim lost disk space. > 36Gb 10,000rpm Ultra 3 160 SCSI > 36Gb 10,000rpm Ultra 3 160 SCSI > 146Gb 10,000rpm U320 SCSI > 146Gb 10,000rpm U320 SCSI > 146Gb 10,000rpm U320 SCSI > > PERC 3/DC RAID Controller (128MB Cache) If that box has a built in U320 controller or you can bypass the Perc, give the Linux kernel level RAID1 and RAID5 drivers a try. On a dual CPU box of that speed, they may well outrun many hardware controllers. Contrary to popular opinion, software RAID is not slow in Linux. > RAID1 for 2x 36Gb drives > RAID5 for 3x 146Gb drives You might wanna do something like go to all 146 gig drives, put a mirror set on the first 20 or so gigs for the OS, and then use the remainder (5x120gig or so ) to make your RAID5. The more drives in a RAID5 the better, generally, up to about 8 or 12 as the optimal for most setups. But that setup of a RAID1 and RAID5 set is fine as is. By running software RAID you may be able to afford to upgrade the 36 gig drives... > Would it be better to have a dual 2.4GHz setup rather than a single 2.8GHz > or would it not make much difference? Yes it would. Linux servers running databases are much more responsive with dual CPUs. > Am I overdoing any particular component at the expense of another? Maybe the RAID controller cost versus having more big hard drives.
Hi scott, Thanks for the info > You might wanna do something like go to all 146 gig drives, put a mirror > set on the first 20 or so gigs for the OS, and then use the remainder > (5x120gig or so ) to make your RAID5. The more drives in a RAID5 the > better, generally, up to about 8 or 12 as the optimal for most setups. I am not quite sure I understand what you mean here... Do you mean take 20Gb from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the first 2 drives? Thanks again for your help adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Fri, 30 May 2003, Adam Witney wrote: > Hi scott, > > Thanks for the info > > > You might wanna do something like go to all 146 gig drives, put a mirror > > set on the first 20 or so gigs for the OS, and then use the remainder > > (5x120gig or so ) to make your RAID5. The more drives in a RAID5 the > > better, generally, up to about 8 or 12 as the optimal for most setups. > > I am not quite sure I understand what you mean here... Do you mean take 20Gb > from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the > first 2 drives? You could do it either way, since the linux kernel supports more than 2 drives in a mirror. But, this costs on writes, so don't do it for things like /var or the pg_xlog directory. There are a few ways you could arrange 5 146 gig drives. One might be to make the first 20 gig on each drive part of a mirror set where the first two drives are the live mirror, and the next three are hot spares. Then you could setup your RAID5 to have 4 live drives and 1 hot spare. Hot spares are nice to have because they provide for the shortest period of time during which your machine is running with a degraded RAID array. note that in linux you can set the kernel parameter dev.raid.speed_limit_max and dev.raid.speed_limit_min to control the rebuild bandwidth used so that when a disk dies you can set a compromise between fast rebuilds, and lowering the demands on the I/O subsystem during a rebuild. The max limit default is 100k / second, which is quite slow. On a machine with Ultra320 gear, you could set that to 10 ot 20 megs a second and still not saturate your SCSI buss. Now that I think of it, you could probably set it up so that you have a mirror set for the OS, one for pg_xlog, and then use the rest of the drives as RAID5. Then grab space on the fifth drive to make a hot spare for both the pg_xlog and the OS drive. Drive 0 [OS RAID1 20 Gig D0][big data drive RAID5 106 Gig D0] Drive 1 [OS RAID1 20 Gig D1][big data drive RAID5 106 Gig D1] Drive 2 [pg_xlog RAID1 20 gig D0][big data drive RAID5 106 Gig D2] Drive 3 [pg_xlog RAID1 20 gig D1][big data drive RAID5 106 Gig D3] Drive 4 [OS hot spare 20 gig][g_clog hot spare 20 gig][big data drive RAID5 106 Gig hot spare] That would give you ~ 300 gigs storage. Of course, there will likely be slightly less performance than you might get from dedicated RAID arrays for each RAID1/RAID5 set, but my guess is that by having 4 (or 5 if you don't want a hot spare) drives in the RAID5 it'll still be faster than a dedicated 3 drive RAID array.
Based on what you've said, I would guess you are considering the Dell PowerEdge 2650 since it has 5 drive bays. If you couldafford the rackspace and just a bit more money, I'd get the tower configuration 2600 with 6 drive bays (and rack railsif needed - Dell even gives you a special rackmount faceplate if you order a tower with rack rails). This would allowyou to have this configuration, which I think would be about ideal for the price range you are looking at: * Linux kernel RAID * Dual processors - better than a single faster processor, especially with concurrent user load and software RAID on topof that * 2x36GB in RAID-1 (for OS and WAL) * 4x146GB in RAID-10 (for data) (alternative: 4-disk RAID-5) The RAID-10 array gives you the same amount of space you would have with a 3-disk RAID-5 and improved fault tolerance. AlthoughI'm pretty sure your drives won't be hot-swappable with the software RAID - I've never actually had to do it. I can't say I like Scott's idea much because the WAL and OS are competing for disk time with the data since they are on thesame physical disk. In a database that is mainly reads with few writes, this wouldn't be such a problem though. Just my inexpert opinion, Roman -----Original Message----- From: Adam Witney [mailto:awitney@sghms.ac.uk] Sent: Fri 5/30/2003 9:55 AM To: scott.marlowe; Adam Witney Cc: pgsql-performance Subject: Re: [PERFORM] Hardware advice Hi scott, Thanks for the info > You might wanna do something like go to all 146 gig drives, put a mirror > set on the first 20 or so gigs for the OS, and then use the remainder > (5x120gig or so ) to make your RAID5. The more drives in a RAID5 the > better, generally, up to about 8 or 12 as the optimal for most setups. I am not quite sure I understand what you mean here... Do you mean take 20Gb from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the first 2 drives? Thanks again for your help adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Fri, 2003-05-30 at 07:44, Andrew Sullivan wrote: > On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote: > > RAID5 for 3x 146Gb drives > > I find the RAID5 on the PERC to be painfully slow. It's _really_ bad > if you don't put WAL on its own drive. This seems to be an issue with the dell firmware. The megaraid devel list has been tracking this issue on and off for some time now. People have had good luck with a couple of different fixes. The PERC cards -can- be made not to suck and the LSI cards simply don't have the problem. ( Since they are effectively the same card its the opinion that its the firmware ) > Also, you don't mention it, but check to make sure you're getting ECC > memory on these boxes. Random memory errors which go undetected will > make you very unhappy. ECC lowers (but doesn't eliminate, > apparently) your chances. 100% agree with this note. > A > > -- > ---- > Andrew Sullivan 204-4141 Yonge Street > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M2P 2A8 > +1 416 646 3304 x110 > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Attachment
On Fri, 30 May 2003, Roman Fail wrote: > Based on what you've said, I would guess you are considering the Dell PowerEdge 2650 since it has 5 drive bays. If youcould afford the rackspace and just a bit more money, I'd get the tower configuration 2600 with 6 drive bays (and rackrails if needed - Dell even gives you a special rackmount faceplate if you order a tower with rack rails). This wouldallow you to have this configuration, which I think would be about ideal for the price range you are looking at: > > * Linux kernel RAID Actually, I think he was looking at hardware RAID, but I was recommending software RAID as at least an option. I've found that on modern hardware with late model kernels, Linux is pretty fast with straight RAID, but not as good with layering it, fyi. I haven't tested since 2.4.9 though, so things may well have changed, and hopefully for the better, in relation to running fast in layered RAID. they both would likely work well, but going with a sub par HW raid card will make the system slower than the kernel sw RAID. > * Dual processors - better than a single faster processor, especially > with concurrent user load and software RAID on top of that > * 2x36GB in RAID-1 (for OS and WAL) > * 4x146GB in RAID-10 (for data) (alternative: 4-disk RAID-5) > > The RAID-10 array gives you the same amount of space you would have > with a 3-disk RAID-5 and improved fault tolerance. Although I'm pretty > sure your drives won't be hot-swappable with the software RAID - I've > never actually had to do it. I agree that 6 drives makes this a much better option. Actually, the hot swappable issue can only be accomplished in linux kernel sw raid by using multiple controllers. It's not really "hot swap" because you have to basically reset that card and it's information about which drives are on it. Using two controllers, where one runs one RAID0 set, and the other runs another RAID0 set, and you run a RAID1 on top, you can then use hot swap shoes and replace failed drives. The improved fault tolerance of the RAID 1+0 is minimal over the RAID5 if the RAID5 has a hot spare, but it is there. I've removed and added drives to running arrays, and the raidhotadd program to do it is quite easy to drive. It all seemed to work quite well. The biggest problem you'll note when a drive fails is that the kernel / scsi driver will keep resetting the bus and timing out the device, so with a failed device, linux kernel RAID can be a bit doggish until you restart the SCSI driver so it KNOWs the drive's not there and quits asking for it over and over. > I can't say I like Scott's idea much because the WAL and OS are > competing for disk time with the data since they are on the same > physical disk. In a database that is mainly reads with few writes, > this wouldn't be such a problem though. You'd be surprised how often this is a non-issue. If you're writing 20,000 records every 10 minutes or so, the location of the WAL file is not that important. The machine will lug for a few seconds, insert, and be done. The speed increase averaged out over time is almost nothing. Now, transactional systems are a whole nother enchilada. I got the feeling from the original post this was more a batch processing kinda thing. I knew the solution I was giving was suboptimal on performance (I might have even alluded to that...). I was going more for maximizing use of rack space and getting the most storage. I think the user said that this project might well grow to 250 or 300 gig, so size is probably more or as important as speed for this system. RAID5 is pretty much the compromise RAID set. It's not necessarily the fastest, it certainly isn't the sexiest, but it provides a lot of storage for very little redundancy cost, and with a hot spare it's pretty much 24/7 with a couple days off a year for scheduled maintenance. Combine that with having n-1 number of platters for each read to be spread across make it a nice choice for data warehousing or report serving. Whatever he does, he should make sure he turns off atime on the data partition. That can utterly kill a postgresql / linux box by a factor of right at two for someone doing small reads.
On 30 May 2003, Will LaShell wrote: > On Fri, 2003-05-30 at 07:44, Andrew Sullivan wrote: > > On Fri, May 30, 2003 at 03:23:28PM +0100, Adam Witney wrote: > > > RAID5 for 3x 146Gb drives > > > > I find the RAID5 on the PERC to be painfully slow. It's _really_ bad > > if you don't put WAL on its own drive. > > This seems to be an issue with the dell firmware. The megaraid devel > list has been tracking this issue on and off for some time now. People > have had good luck with a couple of different fixes. The PERC cards > -can- be made not to suck and the LSI cards simply don't have the > problem. ( Since they are effectively the same card its the opinion that > its the firmware ) I've used the LSI/MegaRAID cards in the past. They're not super fast, but they're not slow either. Very solid operation. Sometimes the firmware makes you feel like you're wearing handcuffs compared to the relative freedom in the kernel sw drivers (i.e. you can force the kernel to take back a failed drive, the megaraid just won't take it back until it's been formatted, that kind of thing). The LSI plain scsi cards in general are great cards, I got an UWSCSI card by them with gigabit ethernet thrown in off ebay a couple years back and it's VERY fast and stable. Also, if you're getting cache memory on the megaraid/perc card, make sure you get the battery backup module.
On 30/5/03 6:17 pm, "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Fri, 30 May 2003, Adam Witney wrote: > >> Hi scott, >> >> Thanks for the info >> >>> You might wanna do something like go to all 146 gig drives, put a mirror >>> set on the first 20 or so gigs for the OS, and then use the remainder >>> (5x120gig or so ) to make your RAID5. The more drives in a RAID5 the >>> better, generally, up to about 8 or 12 as the optimal for most setups. >> >> I am not quite sure I understand what you mean here... Do you mean take 20Gb >> from each of the 5 drives to setup a 20Gb RAID 1 device? Or just from the >> first 2 drives? > > You could do it either way, since the linux kernel supports more than 2 > drives in a mirror. But, this costs on writes, so don't do it for things > like /var or the pg_xlog directory. > > There are a few ways you could arrange 5 146 gig drives. > > One might be to make the first 20 gig on each drive part of a mirror set > where the first two drives are the live mirror, and the next three are hot > spares. Then you could setup your RAID5 to have 4 live drives and 1 hot > spare. > > Hot spares are nice to have because they provide for the shortest period > of time during which your machine is running with a degraded RAID array. > > note that in linux you can set the kernel parameter > dev.raid.speed_limit_max and dev.raid.speed_limit_min to control the > rebuild bandwidth used so that when a disk dies you can set a compromise > between fast rebuilds, and lowering the demands on the I/O subsystem > during a rebuild. The max limit default is 100k / second, which is quite > slow. On a machine with Ultra320 gear, you could set that to 10 ot 20 > megs a second and still not saturate your SCSI buss. > > Now that I think of it, you could probably set it up so that you have a > mirror set for the OS, one for pg_xlog, and then use the rest of the > drives as RAID5. Then grab space on the fifth drive to make a hot spare > for both the pg_xlog and the OS drive. > > Drive 0 > [OS RAID1 20 Gig D0][big data drive RAID5 106 Gig D0] > Drive 1 > [OS RAID1 20 Gig D1][big data drive RAID5 106 Gig D1] > Drive 2 > [pg_xlog RAID1 20 gig D0][big data drive RAID5 106 Gig D2] > Drive 3 > [pg_xlog RAID1 20 gig D1][big data drive RAID5 106 Gig D3] > Drive 4 > [OS hot spare 20 gig][g_clog hot spare 20 gig][big data drive RAID5 106 > Gig hot spare] > > That would give you ~ 300 gigs storage. > > Of course, there will likely be slightly less performance than you might > get from dedicated RAID arrays for each RAID1/RAID5 set, but my guess is > that by having 4 (or 5 if you don't want a hot spare) drives in the RAID5 > it'll still be faster than a dedicated 3 drive RAID array. > Hi Scott, Just following up a post from a few months back... I have now purchased the hardware, do you have a recommended/preferred Linux distro that is easy to configure for software RAID? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.