Thread: Opinions on Raid
We have been running Postgres on a 2U server with 2 disks configured in raid 1 for the os and logs and 4 disks configured in raid 10 for the data. I have since been told raid 5 would have been a better option given our usage of Dell equipment and the way they handle raid 10. I have just a few general questions about raid with respect to Postgres: [1] What is the performance penalty of software raid over hardware raid? Is it truly significant? We will be working with 100s of GB to 1-2 TB of data eventually. [2] How do people on this list monitor their hardware raid? Thus far we have used Dell and the only way to easily monitor disk status is to use their openmanage application. Do other controllers offer easier means of monitoring individual disks in a raid configuration? It seems one advantage software raid has is the ease of monitoring. I truly appreciate any assistance or input. As an additional question, does anyone have any strong recommendations for vendors that offer both consulting/training and support? We are currently speaking with Command Prompt, EnterpriseDB, and Greenplum but I am certainly open to hearing any other recommendations. Thanks, Joe
Joe Uhl wrote: > We have been running Postgres on a 2U server with 2 disks configured in > raid 1 for the os and logs and 4 disks configured in raid 10 for the > data. I have since been told raid 5 would have been a better option > given our usage of Dell equipment and the way they handle raid 10. I > have just a few general questions about raid with respect to Postgres: > > [1] What is the performance penalty of software raid over hardware raid? > Is it truly significant? We will be working with 100s of GB to 1-2 TB > of data eventually. this depends a lot on the raidcontroller (whether it has or not BBWC for example) - for some use-cases softwareraid is actually faster(especially for seq-io tests). > > [2] How do people on this list monitor their hardware raid? Thus far we > have used Dell and the only way to easily monitor disk status is to use > their openmanage application. Do other controllers offer easier means > of monitoring individual disks in a raid configuration? It seems one > advantage software raid has is the ease of monitoring. well the answer to that question depends on what you are using for your network monitoring as a whole as well as your Platform of choice. If you use say nagios and Linux it makes sense to use a nagios plugin (we do that here with a unified check script that checks everything from LSI-MPT based raid cards, over IBMs ServeRAID, HPs Smartarray,LSI MegaRAID cards and also Linux/Solaris Software RAID). If you are using another monitoring solution(OpenView, IBM Directory,...) your solution might look different. Stefan
At 08:12 AM 2/27/2007, Joe Uhl wrote: >We have been running Postgres on a 2U server with 2 disks configured in >raid 1 for the os and logs and 4 disks configured in raid 10 for the >data. I have since been told raid 5 would have been a better option >given our usage of Dell equipment and the way they handle raid 10. I >have just a few general questions about raid with respect to Postgres: > >[1] What is the performance penalty of software raid over hardware raid? > Is it truly significant? We will be working with 100s of GB to 1-2 TB >of data eventually. The real CPU overhead when using SW RAID is when using any form of SW RAID that does XOR operations as part of writes (RAID 5, 6, 50, ..., etc). At that point, you are essentially hammering on the CPU just as hard as you would on a dedicated RAID controller... ...and the dedicated RAID controller probably has custom HW helping it do this sort of thing more efficiently. That being said, SW RAID 5 in this sort of scenario can be reasonable if you =dedicate= a CPU core to it. So in such a system, your "n" core box is essentially a "n-1" core box because you have to lock a core to doing nothing but RAID management. Religious wars aside, this actually can work well. You just have to understand and accept what needs to be done. SW RAID 1, or 10, or etc should not impose a great deal of CPU overhead, and often can be =faster= than a dedicated RAID controller. SW RAID 5 etc in usage scenarios involving far more reads than writes and light write loads can work quite well even if you don't dedicate a core to RAID management, but you must be careful about workloads that are, or that contain parts that are, examples of the first scenario I gave. If you have any doubts about whether you are doing too many writes, dedicate a core to RAID stuff as in the first scenario. >[2] How do people on this list monitor their hardware raid? Thus far we >have used Dell and the only way to easily monitor disk status is to use >their openmanage application. Do other controllers offer easier means >of monitoring individual disks in a raid configuration? It seems one >advantage software raid has is the ease of monitoring. Many RAID controller manufacturers and storage product companies offer reasonable monitoring / management tools. 3ware AKA AMCC has a good reputation in this area for their cards. So does Areca. I personally do not like Adaptec's SW for this purpose, but YMMV. LSI Logic has had both good and bad SW in this area over the years. Dell, HP, IBM, etc's offerings in this area tend to be product line specific. I'd insist on some sort of "try before you buy" if the ease of use / quality of the SW matters to your overall purchase decision. Then there are the various CSSW and OSSW packages that contain this functionality or are dedicated to it. Go find some reputable reviews. (HEY LURKERS FROM Tweakers.net: ^^^ THAT"S AN ARTICLE IDEA ;-) ) Cheers, Ron
Hope you don't mind, Ron. This might be splitting hairs. On Tue, Feb 27, 2007 at 11:05:39AM -0500, Ron wrote: > The real CPU overhead when using SW RAID is when using any form of SW > RAID that does XOR operations as part of writes (RAID 5, 6, 50, ..., > etc). At that point, you are essentially hammering on the CPU just > as hard as you would on a dedicated RAID controller... ...and the > dedicated RAID controller probably has custom HW helping it do this > sort of thing more efficiently. > That being said, SW RAID 5 in this sort of scenario can be reasonable > if you =dedicate= a CPU core to it. So in such a system, your "n" > core box is essentially a "n-1" core box because you have to lock a > core to doing nothing but RAID management. I have an issue with the above explanation. XOR is cheap. It's one of the cheapest CPU instructions available. Even with high bandwidth, the CPU should always be able to XOR very fast. This leads me to the belief that the RAID 5 problem has to do with getting the data ready to XOR. With RAID 5, the L1/L2 cache is never large enoguh to hold multiple stripes of data under regular load, and the system may not have the blocks in RAM. Reading from RAM to find the missing blocks shows up as CPU load. Reading from disk to find the missing blocks shows up as system load. Dedicating a core to RAID 5 focuses on the CPU - which I believe to be mostly idle waiting for a memory read. Dedicating a core reduces the impact, but can't eliminate it, and the cost of a whole core to sit mostly idle waiting for memory reads is high. Also, any reads scheduled by this core will affect the bandwidth/latency for other cores. Hardware RAID 5 solves this by using its own memory modules - like a video card using its own memory modules. The hardware RAID can read from its own memory or disk all day and not affect system performance. Hopefully it has plenty of memory dedicated to holding the most frequently required blocks. > SW RAID 5 etc in usage scenarios involving far more reads than writes > and light write loads can work quite well even if you don't dedicate > a core to RAID management, but you must be careful about workloads > that are, or that contain parts that are, examples of the first > scenario I gave. If you have any doubts about whether you are doing > too many writes, dedicate a core to RAID stuff as in the first scenario. I found software RAID 5 to suck such that I only use it for backups now. It seemed that Linux didn't care to read-ahead or hold blocks in memory for too long, and preferred to read and then write. It was awful. RAID 5 doesn't seem like a good option even with hardware RAID. They mask the issues with it behind a black box (dedicated hardware). The issues still exist. Most of my system is RAID 1+0 now. I have it broken up. Rarely read or written files (long term storage) in RAID 5, The main system data on RAID 1+0. The main system on RAID 1. A larger build partition on RAID 0. For a crappy server in my basement, I've very happy with my software RAID performance now. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
On Tue, 2007-02-27 at 07:12, Joe Uhl wrote: > We have been running Postgres on a 2U server with 2 disks configured in > raid 1 for the os and logs and 4 disks configured in raid 10 for the > data. I have since been told raid 5 would have been a better option > given our usage of Dell equipment and the way they handle raid 10. Some controllers do no layer RAID effectively. Generally speaking, the cheaper the controller, the worse it's gonna perform. Also, some controllers are optimized more for RAID 5 than RAID 1 or 0. Which controller does your Dell have, btw? > I > have just a few general questions about raid with respect to Postgres: > > [1] What is the performance penalty of software raid over hardware raid? > Is it truly significant? We will be working with 100s of GB to 1-2 TB > of data eventually. For a mostly read system, the performance is generally pretty good. Older linux kernels ran layered RAID pretty slowly. I.e. RAID 1+0 was no faster than RAID 1. The best performance software RAID I found in older linux kernels (2.2, 2.4) was plain old RAID-1. RAID-5 was good at reading, but slow at writing.
On Tue, 2007-02-27 at 12:28, Joe Uhl wrote: > Really appreciate all of the valuable input. The current server has the > Perc4ei controller. > > The impression I am taking from the responses is that we may be okay with > software raid, especially if raid 1 and 10 are what we intend to use. > > I think we can collect enough information from the archives of this list to > help make decisions for the new machine(s), was just very interested in > hearing feedback on software vs. hardware raid. > > We will likely be using the 2.6.18 kernel. Well, whatever you do, benchmark it with what you think will be your typical load. you can do some simple initial tests to see if you're in the ballpark with bonnie++, dd, etc... Then move on to real database tests after that.
Joe Uhl wrote: > [1] What is the performance penalty of software raid over hardware raid? > Is it truly significant? We will be working with 100s of GB to 1-2 TB > of data eventually. One thing you should appreciate about hw vs sw raid is that with the former you can battery-back it and enable controller write caching in order to make disk write latency largely disappear. How much of a performance difference that makes depends on what you're doing with it, of course. See the current thread "Two hard drives --- what to do with them?" for some discussion of the virtues of battery-backed raid. > [2] How do people on this list monitor their hardware raid? Thus far we > have used Dell and the only way to easily monitor disk status is to use > their openmanage application. Do other controllers offer easier means > of monitoring individual disks in a raid configuration? It seems one > advantage software raid has is the ease of monitoring. Personally I use nagios with nrpe for most of the monitoring, and write a little wrapper around the cli monitoring tool from the controller manufacturer to grok whether it's in a good/degraded/bad state. Dell PERC controllers I think are mostly just derivatives of Adaptec/LSI controllers, so you might be able to get a more convenient monitoring tool from one of them that might work. See if you can find your PERC version in http://pciids.sourceforge.net/pci.ids, or if you're using Linux then which hw raid module is loaded for it, to get an idea of which place to start looking for that. - Geoff
On 28-2-2007 0:42 Geoff Tolley wrote: >> [2] How do people on this list monitor their hardware raid? Thus far we >> have used Dell and the only way to easily monitor disk status is to use >> their openmanage application. Do other controllers offer easier means >> of monitoring individual disks in a raid configuration? It seems one >> advantage software raid has is the ease of monitoring. Recent Dell raid-controllers are based on LSI chips, although they are not exactly the same as similar LSI-controllers (anymore). Our Dell Perc5/e and 5/i work with the MegaCLI-tool from LSI. But that tool has really limited documentation from LSI itself. Luckily Fujitsu-Siemens offers a nice PDF: http://manuals.fujitsu-siemens.com/serverbooks/content/manuals/english/mr-sas-sw-ug-en.pdf Besides that, there are several Dell linux resources popping up, including on their own site: http://linux.dell.com/ > Personally I use nagios with nrpe for most of the monitoring, and write > a little wrapper around the cli monitoring tool from the controller > manufacturer to grok whether it's in a good/degraded/bad state. If you have a MegaCLI-version, I'd like to see it, if possible? That would definitely save us some reinventing the wheel :-) > Dell PERC controllers I think are mostly just derivatives of Adaptec/LSI > controllers, so you might be able to get a more convenient monitoring > tool from one of them that might work. See if you can find your PERC > version in http://pciids.sourceforge.net/pci.ids, or if you're using > Linux then which hw raid module is loaded for it, to get an idea of > which place to start looking for that. The current ones are afaik all LSI-based. But at least the recent SAS controllers (5/i and 5/e) are. Best regards, Arjen
Really appreciate all of the valuable input. The current server has the Perc4ei controller. The impression I am taking from the responses is that we may be okay with software raid, especially if raid 1 and 10 are what we intend to use. I think we can collect enough information from the archives of this list to help make decisions for the new machine(s), was just very interested in hearing feedback on software vs. hardware raid. We will likely be using the 2.6.18 kernel. Thanks for everyone's input, Joe -----Original Message----- From: Scott Marlowe [mailto:smarlowe@g2switchworks.com] Sent: Tuesday, February 27, 2007 12:56 PM To: Joe Uhl Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Opinions on Raid On Tue, 2007-02-27 at 07:12, Joe Uhl wrote: > We have been running Postgres on a 2U server with 2 disks configured in > raid 1 for the os and logs and 4 disks configured in raid 10 for the > data. I have since been told raid 5 would have been a better option > given our usage of Dell equipment and the way they handle raid 10. Some controllers do no layer RAID effectively. Generally speaking, the cheaper the controller, the worse it's gonna perform. Also, some controllers are optimized more for RAID 5 than RAID 1 or 0. Which controller does your Dell have, btw? > I > have just a few general questions about raid with respect to Postgres: > > [1] What is the performance penalty of software raid over hardware raid? > Is it truly significant? We will be working with 100s of GB to 1-2 TB > of data eventually. For a mostly read system, the performance is generally pretty good. Older linux kernels ran layered RAID pretty slowly. I.e. RAID 1+0 was no faster than RAID 1. The best performance software RAID I found in older linux kernels (2.2, 2.4) was plain old RAID-1. RAID-5 was good at reading, but slow at writing.
On Sat, Mar 03, 2007 at 12:30:16PM +0100, Arjen van der Meijden wrote: > If you have a MegaCLI-version, I'd like to see it, if possible? That > would definitely save us some reinventing the wheel :-) A friend of mine just wrote MegaCli -AdpAllInfo -a0|egrep ' (Degraded|Offline|Critical Disks|Failed Disks)' | grep -v ': 0 $' which will output errors if there are any, and none otherwise. Or just add -q to the grep and check the return status. (Yes, simplistic, but often all you want to know is if all's OK or not...) /* Steinar */ -- Homepage: http://www.sesse.net/