Thread: Opinions on Raid

From:
"Joe Uhl"
Date:

We have been running Postgres on a 2U server with 2 disks configured in
raid 1 for the os and logs and 4 disks configured in raid 10 for the
data.  I have since been told raid 5 would have been a better option
given our usage of Dell equipment and the way they handle raid 10.  I
have just a few general questions about raid with respect to Postgres:

[1] What is the performance penalty of software raid over hardware raid?
 Is it truly significant?  We will be working with 100s of GB to 1-2 TB
of data eventually.

[2] How do people on this list monitor their hardware raid?  Thus far we
have used Dell and the only way to easily monitor disk status is to use
their openmanage application.  Do other controllers offer easier means
of monitoring individual disks in a raid configuration?  It seems one
advantage software raid has is the ease of monitoring.

I truly appreciate any assistance or input.  As an additional question,
does anyone have any strong recommendations for vendors that offer both
consulting/training and support?  We are currently speaking with Command
Prompt, EnterpriseDB, and Greenplum but I am certainly open to hearing
any other recommendations.

Thanks,

Joe

From:
Stefan Kaltenbrunner
Date:

Joe Uhl wrote:
> We have been running Postgres on a 2U server with 2 disks configured in
> raid 1 for the os and logs and 4 disks configured in raid 10 for the
> data.  I have since been told raid 5 would have been a better option
> given our usage of Dell equipment and the way they handle raid 10.  I
> have just a few general questions about raid with respect to Postgres:
>
> [1] What is the performance penalty of software raid over hardware raid?
>  Is it truly significant?  We will be working with 100s of GB to 1-2 TB
> of data eventually.

this depends a lot on the raidcontroller (whether it has or not BBWC for
example) - for some use-cases softwareraid is actually faster(especially
for seq-io tests).

>
> [2] How do people on this list monitor their hardware raid?  Thus far we
> have used Dell and the only way to easily monitor disk status is to use
> their openmanage application.  Do other controllers offer easier means
> of monitoring individual disks in a raid configuration?  It seems one
> advantage software raid has is the ease of monitoring.

well the answer to that question depends on what you are using for your
network monitoring as a whole as well as your Platform of choice. If you
use say nagios and Linux it makes sense to use a nagios plugin (we do
that here with a unified check script that checks everything from
LSI-MPT based raid cards, over IBMs ServeRAID, HPs Smartarray,LSI
MegaRAID cards and also Linux/Solaris Software RAID).
If you are using another monitoring solution(OpenView, IBM
Directory,...) your solution might look different.


Stefan

From:
Ron
Date:

At 08:12 AM 2/27/2007, Joe Uhl wrote:
>We have been running Postgres on a 2U server with 2 disks configured in
>raid 1 for the os and logs and 4 disks configured in raid 10 for the
>data.  I have since been told raid 5 would have been a better option
>given our usage of Dell equipment and the way they handle raid 10.  I
>have just a few general questions about raid with respect to Postgres:
>
>[1] What is the performance penalty of software raid over hardware raid?
>  Is it truly significant?  We will be working with 100s of GB to 1-2 TB
>of data eventually.
The real CPU overhead when using SW RAID is when using any form of SW
RAID that does XOR operations as part of writes (RAID 5, 6, 50, ...,
etc).  At that point, you are essentially hammering on the CPU just
as hard as you would on a dedicated RAID controller... ...and the
dedicated RAID controller probably has custom HW helping it do this
sort of thing more efficiently.
That being said, SW RAID 5 in this sort of scenario can be reasonable
if you =dedicate= a CPU core to it.  So in such a system, your "n"
core box is essentially a "n-1" core box because you have to lock a
core to doing nothing but RAID management.
Religious wars aside, this actually can work well.  You just have to
understand and accept what needs to be done.

SW RAID 1, or 10, or etc should not impose a great deal of CPU
overhead, and often can be =faster= than a dedicated RAID controller.

SW RAID 5 etc in usage scenarios involving far more reads than writes
and light write loads can work quite well even if you don't dedicate
a core to RAID management, but you must be careful about workloads
that are, or that contain parts that are, examples of the first
scenario I gave.  If you have any doubts about whether you are doing
too many writes, dedicate a core to RAID stuff as in the first scenario.


>[2] How do people on this list monitor their hardware raid?  Thus far we
>have used Dell and the only way to easily monitor disk status is to use
>their openmanage application.  Do other controllers offer easier means
>of monitoring individual disks in a raid configuration?  It seems one
>advantage software raid has is the ease of monitoring.
Many RAID controller manufacturers and storage product companies
offer reasonable monitoring / management tools.

3ware AKA AMCC has a good reputation in this area for their cards.
So does Areca.
I personally do not like Adaptec's SW for this purpose, but YMMV.
LSI Logic has had both good and bad SW in this area over the years.

Dell, HP, IBM, etc's offerings in this area tend to be product line
specific.  I'd insist on  some sort of  "try before you buy" if the
ease of use / quality of the SW matters to your overall purchase decision.

Then there are the various CSSW and OSSW packages that contain this
functionality or are dedicated to it.  Go find some reputable reviews.
(HEY LURKERS FROM Tweakers.net:  ^^^ THAT"S AN ARTICLE IDEA ;-) )

Cheers,
Ron


From:
mark@mark.mielke.cc
Date:

Hope you don't mind, Ron. This might be splitting hairs.

On Tue, Feb 27, 2007 at 11:05:39AM -0500, Ron wrote:
> The real CPU overhead when using SW RAID is when using any form of SW
> RAID that does XOR operations as part of writes (RAID 5, 6, 50, ...,
> etc).  At that point, you are essentially hammering on the CPU just
> as hard as you would on a dedicated RAID controller... ...and the
> dedicated RAID controller probably has custom HW helping it do this
> sort of thing more efficiently.
> That being said, SW RAID 5 in this sort of scenario can be reasonable
> if you =dedicate= a CPU core to it.  So in such a system, your "n"
> core box is essentially a "n-1" core box because you have to lock a
> core to doing nothing but RAID management.

I have an issue with the above explanation. XOR is cheap. It's one of
the cheapest CPU instructions available. Even with high bandwidth, the
CPU should always be able to XOR very fast.

This leads me to the belief that the RAID 5 problem has to do with
getting the data ready to XOR. With RAID 5, the L1/L2 cache is never
large enoguh to hold multiple stripes of data under regular load, and
the system may not have the blocks in RAM. Reading from RAM to find the
missing blocks shows up as CPU load. Reading from disk to find the
missing blocks shows up as system load. Dedicating a core to RAID 5
focuses on the CPU - which I believe to be mostly idle waiting for a
memory read. Dedicating a core reduces the impact, but can't eliminate
it, and the cost of a whole core to sit mostly idle waiting for memory
reads is high. Also, any reads scheduled by this core will affect the
bandwidth/latency for other cores.

Hardware RAID 5 solves this by using its own memory modules - like a
video card using its own memory modules. The hardware RAID can read
from its own memory or disk all day and not affect system performance.
Hopefully it has plenty of memory dedicated to holding the most
frequently required blocks.

> SW RAID 5 etc in usage scenarios involving far more reads than writes
> and light write loads can work quite well even if you don't dedicate
> a core to RAID management, but you must be careful about workloads
> that are, or that contain parts that are, examples of the first
> scenario I gave.  If you have any doubts about whether you are doing
> too many writes, dedicate a core to RAID stuff as in the first scenario.

I found software RAID 5 to suck such that I only use it for backups
now. It seemed that Linux didn't care to read-ahead or hold blocks in
memory for too long, and preferred to read and then write. It was awful.
RAID 5 doesn't seem like a good option even with hardware RAID. They mask
the issues with it behind a black box (dedicated hardware). The issues
still exist.

Most of my system is RAID 1+0 now. I have it broken up. Rarely read or
written files (long term storage) in RAID 5, The main system data on
RAID 1+0. The main system on RAID 1. A larger build partition on RAID
0. For a crappy server in my basement, I've very happy with my
software RAID performance now. :-)

Cheers,
mark

--
 /  /      __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


From:
Scott Marlowe
Date:

On Tue, 2007-02-27 at 07:12, Joe Uhl wrote:
> We have been running Postgres on a 2U server with 2 disks configured in
> raid 1 for the os and logs and 4 disks configured in raid 10 for the
> data.  I have since been told raid 5 would have been a better option
> given our usage of Dell equipment and the way they handle raid 10.

Some controllers do no layer RAID effectively.  Generally speaking, the
cheaper the controller, the worse it's gonna perform.

Also, some controllers are optimized more for RAID 5 than RAID 1 or 0.

Which controller does your Dell have, btw?

>   I
> have just a few general questions about raid with respect to Postgres:
>
> [1] What is the performance penalty of software raid over hardware raid?
>  Is it truly significant?  We will be working with 100s of GB to 1-2 TB
> of data eventually.

For a mostly read system, the performance is generally pretty good.
Older linux kernels ran layered RAID pretty slowly.  I.e. RAID 1+0 was
no faster than RAID 1.  The best performance software RAID I found in
older linux kernels (2.2, 2.4) was plain old RAID-1.  RAID-5 was good at
reading, but slow at writing.


From:
Scott Marlowe
Date:

On Tue, 2007-02-27 at 12:28, Joe Uhl wrote:
> Really appreciate all of the valuable input.  The current server has the
> Perc4ei controller.
>
> The impression I am taking from the responses is that we may be okay with
> software raid, especially if raid 1 and 10 are what we intend to use.
>
> I think we can collect enough information from the archives of this list to
> help make decisions for the new machine(s), was just very interested in
> hearing feedback on software vs. hardware raid.
>
> We will likely be using the 2.6.18 kernel.

Well, whatever you do, benchmark it with what you think will be your
typical load.  you can do some simple initial tests to see if you're in
the ballpark with bonnie++, dd, etc...  Then move on to real database
tests after that.

From:
Geoff Tolley
Date:

Joe Uhl wrote:

> [1] What is the performance penalty of software raid over hardware raid?
>  Is it truly significant?  We will be working with 100s of GB to 1-2 TB
> of data eventually.

One thing you should appreciate about hw vs sw raid is that with the former
you can battery-back it and enable controller write caching in order to
make disk write latency largely disappear. How much of a performance
difference that makes depends on what you're doing with it, of course.

See the current thread "Two hard drives --- what to do with them?" for some
discussion of the virtues of battery-backed raid.

> [2] How do people on this list monitor their hardware raid?  Thus far we
> have used Dell and the only way to easily monitor disk status is to use
> their openmanage application.  Do other controllers offer easier means
> of monitoring individual disks in a raid configuration?  It seems one
> advantage software raid has is the ease of monitoring.

Personally I use nagios with nrpe for most of the monitoring, and write a
little wrapper around the cli monitoring tool from the controller
manufacturer to grok whether it's in a good/degraded/bad state.

Dell PERC controllers I think are mostly just derivatives of Adaptec/LSI
controllers, so you might be able to get a more convenient monitoring tool
from one of them that might work. See if you can find your PERC version in
http://pciids.sourceforge.net/pci.ids, or if you're using Linux then which
hw raid module is loaded for it, to get an idea of which place to start
looking for that.

- Geoff


From:
Arjen van der Meijden
Date:

On 28-2-2007 0:42 Geoff Tolley wrote:
>> [2] How do people on this list monitor their hardware raid?  Thus far we
>> have used Dell and the only way to easily monitor disk status is to use
>> their openmanage application.  Do other controllers offer easier means
>> of monitoring individual disks in a raid configuration?  It seems one
>> advantage software raid has is the ease of monitoring.

Recent Dell raid-controllers are based on LSI chips, although they are
not exactly the same as similar LSI-controllers (anymore). Our Dell
Perc5/e and 5/i work with the MegaCLI-tool from LSI. But that tool has
really limited documentation from LSI itself. Luckily Fujitsu-Siemens
offers a nice PDF:
http://manuals.fujitsu-siemens.com/serverbooks/content/manuals/english/mr-sas-sw-ug-en.pdf

Besides that, there are several Dell linux resources popping up,
including on their own site:
http://linux.dell.com/

> Personally I use nagios with nrpe for most of the monitoring, and write
> a little wrapper around the cli monitoring tool from the controller
> manufacturer to grok whether it's in a good/degraded/bad state.

If you have a MegaCLI-version, I'd like to see it, if possible? That
would definitely save us some reinventing the wheel  :-)

> Dell PERC controllers I think are mostly just derivatives of Adaptec/LSI
> controllers, so you might be able to get a more convenient monitoring
> tool from one of them that might work. See if you can find your PERC
> version in http://pciids.sourceforge.net/pci.ids, or if you're using
> Linux then which hw raid module is loaded for it, to get an idea of
> which place to start looking for that.

The current ones are afaik all LSI-based. But at least the recent SAS
controllers (5/i and 5/e) are.

Best regards,

Arjen

From:
"Joe Uhl"
Date:

Really appreciate all of the valuable input.  The current server has the
Perc4ei controller.

The impression I am taking from the responses is that we may be okay with
software raid, especially if raid 1 and 10 are what we intend to use.

I think we can collect enough information from the archives of this list to
help make decisions for the new machine(s), was just very interested in
hearing feedback on software vs. hardware raid.

We will likely be using the 2.6.18 kernel.

Thanks for everyone's input,

Joe

-----Original Message-----
From: Scott Marlowe [mailto:]
Sent: Tuesday, February 27, 2007 12:56 PM
To: Joe Uhl
Cc: 
Subject: Re: [PERFORM] Opinions on Raid

On Tue, 2007-02-27 at 07:12, Joe Uhl wrote:
> We have been running Postgres on a 2U server with 2 disks configured in
> raid 1 for the os and logs and 4 disks configured in raid 10 for the
> data.  I have since been told raid 5 would have been a better option
> given our usage of Dell equipment and the way they handle raid 10.

Some controllers do no layer RAID effectively.  Generally speaking, the
cheaper the controller, the worse it's gonna perform.

Also, some controllers are optimized more for RAID 5 than RAID 1 or 0.

Which controller does your Dell have, btw?

>   I
> have just a few general questions about raid with respect to Postgres:
>
> [1] What is the performance penalty of software raid over hardware raid?
>  Is it truly significant?  We will be working with 100s of GB to 1-2 TB
> of data eventually.

For a mostly read system, the performance is generally pretty good.
Older linux kernels ran layered RAID pretty slowly.  I.e. RAID 1+0 was
no faster than RAID 1.  The best performance software RAID I found in
older linux kernels (2.2, 2.4) was plain old RAID-1.  RAID-5 was good at
reading, but slow at writing.



From:
"Steinar H. Gunderson"
Date:

On Sat, Mar 03, 2007 at 12:30:16PM +0100, Arjen van der Meijden wrote:
> If you have a MegaCLI-version, I'd like to see it, if possible? That
> would definitely save us some reinventing the wheel  :-)

A friend of mine just wrote

  MegaCli -AdpAllInfo -a0|egrep '  (Degraded|Offline|Critical Disks|Failed Disks)' | grep -v ': 0 $'

which will output errors if there are any, and none otherwise. Or just add -q
to the grep and check the return status.

(Yes, simplistic, but often all you want to know is if all's OK or not...)

/* Steinar */
--
Homepage: http://www.sesse.net/