Thread: 3ware vs. MegaRAID
Hello, I am waiting for an ordered machine dedicated to PostgresSQL. It was expected to have 3ware 9650SE 16 port controller. However, the vendor wants to replace this controller with MegaRAID SAS 84016E, because, as they say, they have it on stock, while 3ware would be available in a few weeks. Is this a good replace, generally? Will it run on FreeBSD, specifically? Thanks Irek.
Hi, > I am waiting for an ordered machine dedicated to PostgresSQL. It was > expected to have 3ware 9650SE 16 port controller. However, the vendor > wants to replace this controller with MegaRAID SAS 84016E, because, as > they say, they have it on stock, while 3ware would be available in a few > weeks. > > Is this a good replace, generally? > Will it run on FreeBSD, specifically? Not sure about that specific controller, but I do have a Fujitsu rebranded "RAID Ctrl SAS onboard 256MB iTBBU LSI" that works pretty good on my FreeBSD 6.2 box with the mfi driver. Getting the megacli tool took some effort as it involves having Linux emulation running but it's now working fine. I wouldn't dare to use it for write operations as I remember it freezing the box just after upgrading to amd64 (it was working good on i386). Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
Ireneusz Pluta wrote: > I am waiting for an ordered machine dedicated to PostgresSQL. It was > expected to have 3ware 9650SE 16 port controller. However, the vendor > wants to replace this controller with MegaRAID SAS 84016E, because, as > they say, they have it on stock, while 3ware would be available in a > few weeks. > > Is this a good replace, generally? > Will it run on FreeBSD, specifically? The MFI driver needed to support that MegaRAID card has been around since FreeBSD 6.1: http://oldschoolpunx.net/phpMan.php/man/mfi/4 The MegaRAID SAS 84* cards have worked extremely well for me in terms of performance and features for all the systems I've seen them installed in. I'd consider it a modest upgrade from that 3ware card, speed wise. The main issue with the MegaRAID cards is that you will have to write a lot of your own custom scripts to monitor for failures using their painful MegaCLI utility, and under FreeBSD that also requires using their Linux utility via emulation: http://www.freebsdsoftware.org/sysutils/linux-megacli.html -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 30/03/2010 19:18, Greg Smith wrote: > The MegaRAID SAS 84* cards have worked extremely well for me in terms of > performance and features for all the systems I've seen them installed > in. I'd consider it a modest upgrade from that 3ware card, speed wise. > The main issue with the MegaRAID cards is that you will have to write a > lot of your own custom scripts to monitor for failures using their > painful MegaCLI utility, and under FreeBSD that also requires using > their Linux utility via emulation: > http://www.freebsdsoftware.org/sysutils/linux-megacli.html Getting MegaCLI to work was a slight PITA, but once it was running it's been just a matter of adding: daily_status_mfi_raid_enable="YES" to /etc/periodic.conf to get the following data in the daily reports: Adpater: 0 ------------------------------------------------------------------------ Physical Drive Information: ENC SLO DEV SEQ MEC OEC PFC LPF STATE 1 0 0 2 0 0 0 0 Online 1 1 1 2 0 0 0 0 Online 1 2 2 2 0 0 0 0 Online 1 3 3 2 0 0 0 0 Online 1 4 4 2 0 0 0 0 Online 1 5 5 2 0 0 0 0 Online 1 255 248 0 0 0 0 0 Unconfigured(good) Virtual Drive Information: VD DRV RLP RLS RLQ STS SIZE STATE NAME 0 2 1 0 0 64kB 69472MB Optimal 1 2 1 3 0 64kB 138944MB Optimal BBU Information: TYPE TEMP OK RSOC ASOC RC CC ME iTBBU 29 C -1 94 93 816 109 2 Controller Logs: +++ /var/log/mfi_raid_0.today Sun Mar 28 03:07:36 2010 @@ -37797,3 +37797,25 @@ Event Description: Patrol Read complete Event Data: None + +======================================================================== +seqNum: 0x000036f6 +Time: Sat Mar 27 03:00:00 2010 + +Code: 0x00000027 +Class: 0 +Locale: 0x20 +Event Description: Patrol Read started +Event Data: + None etc... Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
Ireneusz Pluta writes: > I am waiting for an ordered machine dedicated to PostgresSQL. It was > expected to have 3ware 9650SE 16 port controller. However, the vendor > wants to replace this controller with MegaRAID SAS 84016E, because, as I have had better luck getting 3ware management tools to work on both FreeBSD and Linux than the Megaraid cards. I also like the 3ware cards can be configured to send out an email in case of problems once you have the monitoring program running.
Greg Smith pisze: > > The MegaRAID SAS 84* cards have worked extremely well for me in terms > of performance and features for all the systems I've seen them > installed in. I'd consider it a modest upgrade from that 3ware card, > speed wise. OK, sounds promising. > The main issue with the MegaRAID cards is that you will have to write > a lot of your own custom scripts to monitor for failures using their > painful MegaCLI utility, and under FreeBSD that also requires using > their Linux utility via emulation: > http://www.freebsdsoftware.org/sysutils/linux-megacli.html > And this is what worries me, as I prefer not to play with utilities too much, but put the hardware into production, instead. So I'd like to find more precisely if expected speed boost would pay enough for that pain. Let me ask the following way then, if such a question makes much sense with the data I provide. I already have another box with 3ware 9650SE-16ML. With the array configured as follows: RAID-10, 14 x 500GB Seagate ST3500320NS, stripe size 256K, 16GB RAM, Xeon X5355, write caching enabled, BBU, FreeBSD 7.2, ufs, when testing with bonnie++ on idle machine, I got sequential block read/write around 320MB/290MB and random seeks around 660. Would that result be substantially better with LSI MegaRAID?
On Apr 6, 2010, at 9:49 AM, Ireneusz Pluta wrote: > Greg Smith pisze: >> >> The MegaRAID SAS 84* cards have worked extremely well for me in terms >> of performance and features for all the systems I've seen them >> installed in. I'd consider it a modest upgrade from that 3ware card, >> speed wise. > OK, sounds promising. >> The main issue with the MegaRAID cards is that you will have to write >> a lot of your own custom scripts to monitor for failures using their >> painful MegaCLI utility, and under FreeBSD that also requires using >> their Linux utility via emulation: >> http://www.freebsdsoftware.org/sysutils/linux-megacli.html >> > And this is what worries me, as I prefer not to play with utilities too > much, but put the hardware into production, instead. So I'd like to find > more precisely if expected speed boost would pay enough for that pain. > Let me ask the following way then, if such a question makes much sense > with the data I provide. I already have another box with 3ware > 9650SE-16ML. With the array configured as follows: > RAID-10, 14 x 500GB Seagate ST3500320NS, stripe size 256K, 16GB RAM, > Xeon X5355, write caching enabled, BBU, FreeBSD 7.2, ufs, > when testing with bonnie++ on idle machine, I got sequential block > read/write around 320MB/290MB and random seeks around 660. > > Would that result be substantially better with LSI MegaRAID? > My experiences with the 3ware 9650 on linux are similar -- horribly slow for some reason with raid 10 on larger arrays. Others have claimed this card performs well on FreeBSD, but the above looks just as bad as Linux. 660 iops is slow for 14 spindles of any type, although the raid 10 on might limit it to an effective 7 spindles on readingin which case its OK -- but should still top 100 iops per effective disk on 7200rpm drives unless the effective concurrencyof the benchmark is low. My experience with the 9650 was that iops was OK, but sequential performance for raid10 was very poor. On linux, I was able to get better sequential read performance like this: * set it up as 3 raid 10 blocks, each 4 drives (2 others spare or for xlog or something). Software RAID-0 these RAID 10chunks together in the OS. * Change the linux 'readahead' block device parameter to at least 4MB (8192, see blockdev --setra) -- I don't know if thereis a FreeBSD equivalent. A better raid card you should hit at minimum 800, if not 1000, MB/sec + depending on whether you bottleneck on your PCIe or SATA ports or not. I switched to two adaptec 5xx5 series cards (each with half thedisks, software raid-0 between them) to get about 1200MB/sec max throughput and 2000iops from two sets of 10 Seagate STxxxxxxxNS1TB drives. That is still not as good as it should be, but much better. FWIW, one set of 8 drives in raid10 on the adaptec did about 750MB/sec sequential and ~950 iops read. It required XFS to do this, ext3 was 20% slowerin throughput. A PERC 6 card (LSI MegaRaid clone) performed somewhere between the two. I don't like bonnie++ much, its OK at single drive tests but not as good at larger arrays. If you have time try fio, andcreate some custom profiles. Lastly, for these sorts of tests partition your array in smaller chunks so that you can reliably test the front or back ofthe drive. Sequential speed at the front of a typical 3.5" drive is about 2x as fast as at the end of the drive. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
For a card level RAID controller, I am a big fan of the LSI 8888, which is available in a PCIe riser form factor for blade / 1U servers, and comes with 0.5GB of battery backed cache. Full Linux support including mainline kernel drivers and command line config tools. Was using these with SAS expanders and 48x 1TB SATA-300 spindles per card, and it was pretty (adjective) quick for a card-based system ... comparable with a small FC-AL EMC Clariion CX3 series in fact, just without the redundancy.
My only gripe is that as of 18 months ago, it did not support triples (RAID-10 with 3 drives per set instead of 2) ... I had a "little knowledge is a dangerous thing" client who was stars-in-the-eyes sold on RAID-6 and so wanted double drive failure protection for everything (and didn't get my explanation about how archive logs on other LUNs make this OK, or why RAID-5/6 sucks for a database, or really listen to anything I said :-) ... It would do RAID-10 quads however (weird...).
Also decent in the Dell OEM'ed version (don't know the Dell PERC model number) though they tend to be a bit behind on firmware.
MegaCLI isn't the slickest tool, but you can find Nagios scripts for it online ... what's the problem? The Clariion will send you (and EMC support) an email if it loses a drive, but I'm not sure that's worth the 1500% price difference ;-)
Cheers
Dave
My only gripe is that as of 18 months ago, it did not support triples (RAID-10 with 3 drives per set instead of 2) ... I had a "little knowledge is a dangerous thing" client who was stars-in-the-eyes sold on RAID-6 and so wanted double drive failure protection for everything (and didn't get my explanation about how archive logs on other LUNs make this OK, or why RAID-5/6 sucks for a database, or really listen to anything I said :-) ... It would do RAID-10 quads however (weird...).
Also decent in the Dell OEM'ed version (don't know the Dell PERC model number) though they tend to be a bit behind on firmware.
MegaCLI isn't the slickest tool, but you can find Nagios scripts for it online ... what's the problem? The Clariion will send you (and EMC support) an email if it loses a drive, but I'm not sure that's worth the 1500% price difference ;-)
Cheers
Dave
On Wed, Apr 7, 2010 at 10:29 PM, Scott Carey <scott@richrelevance.com> wrote:
My experiences with the 3ware 9650 on linux are similar -- horribly slow for some reason with raid 10 on larger arrays.
On Apr 6, 2010, at 9:49 AM, Ireneusz Pluta wrote:
> Greg Smith pisze:
>>
>> The MegaRAID SAS 84* cards have worked extremely well for me in terms
>> of performance and features for all the systems I've seen them
>> installed in. I'd consider it a modest upgrade from that 3ware card,
>> speed wise.
> OK, sounds promising.
>> The main issue with the MegaRAID cards is that you will have to write
>> a lot of your own custom scripts to monitor for failures using their
>> painful MegaCLI utility, and under FreeBSD that also requires using
>> their Linux utility via emulation:
>> http://www.freebsdsoftware.org/sysutils/linux-megacli.html
>>
> And this is what worries me, as I prefer not to play with utilities too
> much, but put the hardware into production, instead. So I'd like to find
> more precisely if expected speed boost would pay enough for that pain.
> Let me ask the following way then, if such a question makes much sense
> with the data I provide. I already have another box with 3ware
> 9650SE-16ML. With the array configured as follows:
> RAID-10, 14 x 500GB Seagate ST3500320NS, stripe size 256K, 16GB RAM,
> Xeon X5355, write caching enabled, BBU, FreeBSD 7.2, ufs,
> when testing with bonnie++ on idle machine, I got sequential block
> read/write around 320MB/290MB and random seeks around 660.
>
> Would that result be substantially better with LSI MegaRAID?
>
Others have claimed this card performs well on FreeBSD, but the above looks just as bad as Linux.
660 iops is slow for 14 spindles of any type, although the raid 10 on might limit it to an effective 7 spindles on reading in which case its OK -- but should still top 100 iops per effective disk on 7200rpm drives unless the effective concurrency of the benchmark is low. My experience with the 9650 was that iops was OK, but sequential performance for raid 10 was very poor.
On linux, I was able to get better sequential read performance like this:
* set it up as 3 raid 10 blocks, each 4 drives (2 others spare or for xlog or something). Software RAID-0 these RAID 10 chunks together in the OS.
* Change the linux 'readahead' block device parameter to at least 4MB (8192, see blockdev --setra) -- I don't know if there is a FreeBSD equivalent.
A better raid card you should hit at minimum 800, if not 1000, MB/sec + depending on
whether you bottleneck on your PCIe or SATA ports or not. I switched to two adaptec 5xx5 series cards (each with half the disks, software raid-0 between them) to get about 1200MB/sec max throughput and 2000iops from two sets of 10 Seagate STxxxxxxxNS 1TB drives. That is still not as good as it should be, but much better. FWIW, one set of 8 drives in raid 10 on the adaptec did about 750MB/sec sequential and ~950 iops read. It required XFS to do this, ext3 was 20% slower in throughput.
A PERC 6 card (LSI MegaRaid clone) performed somewhere between the two.
I don't like bonnie++ much, its OK at single drive tests but not as good at larger arrays. If you have time try fio, and create some custom profiles.
Lastly, for these sorts of tests partition your array in smaller chunks so that you can reliably test the front or back of the drive. Sequential speed at the front of a typical 3.5" drive is about 2x as fast as at the end of the drive.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Scott Carey wrote: > * Change the linux 'readahead' block device parameter to at least 4MB (8192, see blockdev --setra) -- I don't know if thereis a FreeBSD equivalent. > I haven't tested them, but 3ware gives suggestions at http://www.3ware.com/kb/Article.aspx?id=14852 for tuning their cards properly under FreeBSD. You cannot get good sequential read performance from 3ware's cards without doing something about this at the OS level; the read-ahead on the card itself is minimal and certainly a bottleneck. As for your comments about drives being faster at the front than the end, the zcav tool that comes with bonnie++ is a good way to plot that out, rather than having to split partitions up and do a bunch of manual testing. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Apr 7, 2010, at 11:13 PM, Greg Smith wrote: > Scott Carey wrote: >> * Change the linux 'readahead' block device parameter to at least 4MB (8192, see blockdev --setra) -- I don't know ifthere is a FreeBSD equivalent. >> > I haven't tested them, but 3ware gives suggestions at > http://www.3ware.com/kb/Article.aspx?id=14852 for tuning their cards > properly under FreeBSD. You cannot get good sequential read performance > from 3ware's cards without doing something about this at the OS level; > the read-ahead on the card itself is minimal and certainly a bottleneck. > > As for your comments about drives being faster at the front than the > end, the zcav tool that comes with bonnie++ is a good way to plot that > out, rather than having to split partitions up and do a bunch of manual > testing. There's an FIO script that does something similar. What I'm suggesting is that if you want to test a file system (or compare it to others), and you want to get consistent resultsthen run those tests on a smaller slice of the drive. To tune a RAID card, there is not much point other than tryingout the fast part of the drive, if it can keep up on the fast part, it should be able to keep up on the slow part. I'm would not suggest splitting the drive up into chunks and doing many manual tests. 3.5" drives are a bit more than 50% the sequential throughput at the end than the start. 2.5" drives are a bit less than65% the sequential throughput at the end than the start. I haven't seen any significant variation of that rule on anybenchmark I've run, or I've seen online for 'standard' drives. Occasionally there is a drive that doesn't use all itsspace and is a bit faster at the end. My typical practice is to use the first 70% to 80% of a large volume for the main data, and use the slowest last chunk forarchives and backups. > > -- > Greg Smith 2ndQuadrant US Baltimore, MD > PostgreSQL Training, Services and Support > greg@2ndQuadrant.com www.2ndQuadrant.us >
On 2010-04-08 05:44, Dave Crooke wrote: > For a card level RAID controller, I am a big fan of the LSI 8888, which is > available in a PCIe riser form factor for blade / 1U servers, and comes with > 0.5GB of battery backed cache. Full Linux support including mainline kernel > drivers and command line config tools. Was using these with SAS expanders > and 48x 1TB SATA-300 spindles per card, and it was pretty (adjective) quick > for a card-based system ... comparable with a small FC-AL EMC Clariion CX3 > series in fact, just without the redundancy. > Can someone shed "simple" light on an extremely simple question. How do you physicallly get 48 drives attached to an LSI that claims to only have 2 internal and 2 external ports? (the controller claims to support up to 240 drives). I'm currently looking at getting a server with space for 8 x 512GB SSDs running raid5 (or 6) and are looking for an well performing controller with BBWC for the setup. So I was looking for something like the LSI888ELP. -- Jesper
Jesper Krogh wrote: > Can someone shed "simple" light on an extremely simple question. > How do you physicallly get 48 drives attached to an LSI that claims to > only have 2 internal and 2 external ports? > (the controller claims to support up to 240 drives). There are these magic boxes that add "SAS expansion", which basically splits a single port so you can connect more drives to it. An example from a vendor some of the regulars on this list like is http://www.aberdeeninc.com/abcatg/kitjbod-1003.htm You normally can't buy these except as part of an integrated drive chassis subsystem. If you get one that has an additional pass-through port, that's how you can stack these into multiple layers and hit really large numbers of disks. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 2010-04-09 17:27, Greg Smith wrote: > Jesper Krogh wrote: >> Can someone shed "simple" light on an extremely simple question. >> How do you physicallly get 48 drives attached to an LSI that claims to >> only have 2 internal and 2 external ports? >> (the controller claims to support up to 240 drives). > > There are these magic boxes that add "SAS expansion", which basically > splits a single port so you can connect more drives to it. An example > from a vendor some of the regulars on this list like is > http://www.aberdeeninc.com/abcatg/kitjbod-1003.htm > > You normally can't buy these except as part of an integrated drive > chassis subsystem. If you get one that has an additional pass-through > port, that's how you can stack these into multiple layers and hit > really large numbers of disks. I've spent quite some hours googling today. Am I totally wrong if the: HP MSA-20/30/70 and Sun Oracle J4200's: https://shop.sun.com/store/product/53a01251-2fce-11dc-9482-080020a9ed93 are of the same type just from "major" vendors. That would enable me to reuse the existing server and moving to something like Intel's X25-M 160GB disks with just a higher amount (25) in a MSA-70. -- Jesper .. that's beginning to look like a decent plan.
Jesper Krogh wrote: > I've spent quite some hours googling today. Am I totally wrong if the: > HP MSA-20/30/70 and Sun Oracle J4200's: > https://shop.sun.com/store/product/53a01251-2fce-11dc-9482-080020a9ed93 > are of the same type just from "major" vendors. Yes, those are the same type of implementation. Every vendor has their own preferred way to handle port expansion, and most are somewhat scared about discussing the whole thing now because EMC has a ridiculous patent on the whole idea[1]. They all work the same from the user perspective, albeit sometimes with their own particular daisy chaining rules. > That would enable me to reuse the existing server and moving to something > like Intel's X25-M 160GB disks with just a higher amount (25) in a > MSA-70. I guess, but note that several of us here consider Intel's SSDs unsuitable for critical database use. There are some rare but not impossible to encounter problems with its write caching implementation that leave you exposed to database corruption if there's a nasty power interruption. Can't get rid of the problem without destroying both performance and longevity of the drive[2][3]. If you're going to deploy something using those drives, please make sure you're using an aggressive real-time backup scheme such as log shipping in order to minimize your chance of catastrophic data loss. [1] http://www.freepatentsonline.com/7624206.html [2] http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ [3] http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On 2010-04-09 20:22, Greg Smith wrote: > Jesper Krogh wrote: >> I've spent quite some hours googling today. Am I totally wrong if the: >> HP MSA-20/30/70 and Sun Oracle J4200's: >> https://shop.sun.com/store/product/53a01251-2fce-11dc-9482-080020a9ed93 >> are of the same type just from "major" vendors. > > Yes, those are the same type of implementation. Every vendor has > their own preferred way to handle port expansion, and most are > somewhat scared about discussing the whole thing now because EMC has a > ridiculous patent on the whole idea[1]. They all work the same from > the user perspective, albeit sometimes with their own particular daisy > chaining rules. > >> That would enable me to reuse the existing server and moving to >> something >> like Intel's X25-M 160GB disks with just a higher amount (25) in a >> MSA-70. > > I guess, but note that several of us here consider Intel's SSDs > unsuitable for critical database use. There are some rare but not > impossible to encounter problems with its write caching implementation > that leave you exposed to database corruption if there's a nasty power > interruption. Can't get rid of the problem without destroying both > performance and longevity of the drive[2][3]. If you're going to > deploy something using those drives, please make sure you're using an > aggressive real-time backup scheme such as log shipping in order to > minimize your chance of catastrophic data loss. > > [1] http://www.freepatentsonline.com/7624206.html > [2] > http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ > > [3] > http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html > There are some things in my scenario... that cannot be said to be general in all database situations. Having to go a week back (backup) is "not really a problem", so as long as i have a reliable backup and the problems doesnt occour except from unexpected poweroffs then I think I can handle it. Another thing is that the overall usage is far dominated by random-reads, which is the performance I dont ruin by disabling write-caching. And by adding a 512/1024MB BBWC on the controller I bet I can "re-gain" enough write performance to easily make the system funcition. Currently the average writeout is way less than 10MB/s but the reading processes all spends most of their time in iowait. Since my application is dominated by by random reads I "think" that I still should have a huge gain over regular SAS drives on that side of the equation, but most likely not on the write-side. But all of this is so far only speculations, since the vendors doesnt seem eager on lending out stuff these day, so everything is only on paper so far. There seem to be consensus that on the write-side, SAS-disks can fairly easy outperform SSDs. I have not seen anything showing that they dont still have huge benefits on the read-side. It would be nice if there was an easy way to test and confirm that it actually was robust to power-outtake.. .. just having a disk-array with build-in-battery for the SSDs would solve the problem. -- Jesper
On Fri, Apr 9, 2010 at 1:02 PM, Jesper Krogh <jesper@krogh.cc> wrote: > It would be nice if there was an easy way to test and confirm that it > actually was robust to power-outtake.. Sadly, the only real test is pulling the power plug. And it can't prove the setup is good, only that it's bad or most likely good. > .. just having a disk-array with build-in-battery for the SSDs would > solve the problem. Even a giant Cap to initiate writeout on poweroff would likely be plenty since they only pull 150mW or so.