Thread: Hardware: HP StorageWorks MSA 1500
We're going to get one for evaluation next week (equipped with dual 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, performance wise? Regards, Mikael
Hmmm. We use an MSA 1000 with Fibre Channel interconnects. No real complaints, although I was a little bit disappointed by the RAID controller's battery-backed write cache performance; tiny random writes are only about 3 times as fast with write caching enabled as with it disabled, I had (perhaps naively) hoped for more. Sequential scans from our main DB (on a 5-pair RAID 10 set with 15k RPM drives) get roughly 80MB/sec. Getting the redundant RAID controllers to fail over correctly on Linux was a big headache and required working the tech support phone all day until we finally got to the deep guru who knew the proper undocumented incantations. -- Mark Lewis On Thu, 2006-04-20 at 20:00 +0200, Mikael Carneholm wrote: > We're going to get one for evaluation next week (equipped with dual > 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, > performance wise? > > Regards, > Mikael > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Thu, 20 Apr 2006, Mikael Carneholm wrote: > We're going to get one for evaluation next week (equipped with dual > 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, > performance wise? We (Seatbooker) use one. It works well enough. Here's a sample bonnie output: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 16384 41464 30.6 41393 10.0 16287 3.7 92433 83.2 119608 18.3 674.0 0.8 which is hardly bad (on a four 15kRPM disk RAID 10 with 2Gbps FC). Sequential scans on a table produce about 40MB/s of IO with the 'disk' something like 60-70% busy according to FreeBSD's systat. Here's diskinfo -cvt output on a not quite idle system: /dev/da1 512 # sectorsize 59054899200 # mediasize in bytes (55G) 115341600 # mediasize in sectors 7179 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. I/O command overhead: time to read 10MB block 0.279395 sec = 0.014 msec/sector time to read 20480 sectors 11.864934 sec = 0.579 msec/sector calculated command overhead = 0.566 msec/sector Seek times: Full stroke: 250 iter in 0.836808 sec = 3.347 msec Half stroke: 250 iter in 0.861196 sec = 3.445 msec Quarter stroke: 500 iter in 1.415700 sec = 2.831 msec Short forward: 400 iter in 0.586330 sec = 1.466 msec Short backward: 400 iter in 1.365257 sec = 3.413 msec Seq outer: 2048 iter in 1.184569 sec = 0.578 msec Seq inner: 2048 iter in 1.184158 sec = 0.578 msec Transfer rates: outside: 102400 kbytes in 1.367903 sec = 74859 kbytes/sec middle: 102400 kbytes in 1.472451 sec = 69544 kbytes/sec inside: 102400 kbytes in 1.521503 sec = 67302 kbytes/sec It (or any FC SAN, for that matter) isn't an especially cheap way to get storage. You don't get much option if you have an HP blade enclosure, though. HP's support was poor. Their Indian call-centre seems not to know much about them and spectacularly failed to tell us if and how we could connect this (with the 2/3-port FC hub option) to two of our blade servers, one of which was one of the 'half-height' ones which require an arbitrated loop. We ended up buying a FC switch.
Your numbers seem quite ok considering the number of disks. We also get a 256Mb battery backed cache module with it, so I'm looking forward to testing the write performance (first using ext3, then xfs). If I get the enough time to test it, I'll test both raid 0+1 and raid 5 configurations although I trust raid 0+1 more. And no, it's not the cheapest way to get storage - but it's only half as expensive as the other option: an EVA4000, which we're gonna have to go for if we(they) decide to stay in bed with a proprietary database. With postgres we don't need replication on SAN level (using slony) so the MSA 1500 would be sufficient, and that's a good thing (price wise) as we're gonna need two. OTOH, the EVA4000 will not give us mirroring so either way, we're gonna need two of whatever system we go for. Just hoping the MSA 1500 is reliable as well... Support will hopefully not be a problem for us as we have a local company providing support, they're also the ones setting it up for us so at least we'll know right away if they're compentent or not :) Regards, Mikael -----Original Message----- From: Alex Hayward [mailto:xelah@sphinx.mythic-beasts.com] On Behalf Of Alex Hayward Sent: den 21 april 2006 17:25 To: Mikael Carneholm Cc: Pgsql performance Subject: Re: [PERFORM] Hardware: HP StorageWorks MSA 1500 On Thu, 20 Apr 2006, Mikael Carneholm wrote: > We're going to get one for evaluation next week (equipped with dual > 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, > performance wise? We (Seatbooker) use one. It works well enough. Here's a sample bonnie output: -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 16384 41464 30.6 41393 10.0 16287 3.7 92433 83.2 119608 18.3 674.0 0.8 which is hardly bad (on a four 15kRPM disk RAID 10 with 2Gbps FC). Sequential scans on a table produce about 40MB/s of IO with the 'disk' something like 60-70% busy according to FreeBSD's systat. Here's diskinfo -cvt output on a not quite idle system: /dev/da1 512 # sectorsize 59054899200 # mediasize in bytes (55G) 115341600 # mediasize in sectors 7179 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. I/O command overhead: time to read 10MB block 0.279395 sec = 0.014 msec/sector time to read 20480 sectors 11.864934 sec = 0.579 msec/sector calculated command overhead = 0.566 msec/sector Seek times: Full stroke: 250 iter in 0.836808 sec = 3.347 msec Half stroke: 250 iter in 0.861196 sec = 3.445 msec Quarter stroke: 500 iter in 1.415700 sec = 2.831 msec Short forward: 400 iter in 0.586330 sec = 1.466 msec Short backward: 400 iter in 1.365257 sec = 3.413 msec Seq outer: 2048 iter in 1.184569 sec = 0.578 msec Seq inner: 2048 iter in 1.184158 sec = 0.578 msec Transfer rates: outside: 102400 kbytes in 1.367903 sec = 74859 kbytes/sec middle: 102400 kbytes in 1.472451 sec = 69544 kbytes/sec inside: 102400 kbytes in 1.521503 sec = 67302 kbytes/sec It (or any FC SAN, for that matter) isn't an especially cheap way to get storage. You don't get much option if you have an HP blade enclosure, though. HP's support was poor. Their Indian call-centre seems not to know much about them and spectacularly failed to tell us if and how we could connect this (with the 2/3-port FC hub option) to two of our blade servers, one of which was one of the 'half-height' ones which require an arbitrated loop. We ended up buying a FC switch.
Mikael Carneholm wrote: > Your numbers seem quite ok considering the number of disks. We also get > a 256Mb battery backed cache module with it, so I'm looking forward to > testing the write performance (first using ext3, then xfs). If I get the > enough time to test it, I'll test both raid 0+1 and raid 5 > configurations although I trust raid 0+1 more. > > And no, it's not the cheapest way to get storage - but it's only half as > expensive as the other option: an EVA4000, which we're gonna have to go > for if we(they) decide to stay in bed with a proprietary database. With > postgres we don't need replication on SAN level (using slony) so the MSA > 1500 would be sufficient, and that's a good thing (price wise) as we're > gonna need two. OTOH, the EVA4000 will not give us mirroring so either > way, we're gonna need two of whatever system we go for. Just hoping the > MSA 1500 is reliable as well... > > Support will hopefully not be a problem for us as we have a local > company providing support, they're also the ones setting it up for us so > at least we'll know right away if they're compentent or not :) > If I'm reading the original post correctly, the biggest issue is likely to be that the 14 disks on each 2Gbit fibre channel will be throttled to 200Mb/s by the channel , when in fact you could expect (in RAID 10 arrangement) to get about 7 * 70 Mb/s = 490 Mb/s. Cheers Mark
On Mon, 24 Apr 2006, Mark Kirkwood wrote: > If I'm reading the original post correctly, the biggest issue is likely > to be that the 14 disks on each 2Gbit fibre channel will be throttled to > 200Mb/s by the channel , when in fact you could expect (in RAID 10 > arrangement) to get about 7 * 70 Mb/s = 490 Mb/s. The two controllers and two FC switches/hubs are intended for redundancy, rather than performance, so there's only one 2Gbit channel. I don't know if its possible to use both in parallel to get better performance. I believe it's possible to join two or more FC ports on the switch together, but as there's only port going to the controller internally this presumably wouldn't help. There are two SCSI U320 buses, with seven bays on each. I don't know what the overhead of SCSI is, but you're obviously not going to get 490MB/s for each set of seven even if the FC could do it. Of course your database may not spend all day doing sequential scans one at a time over 14 disks, so it doesn't necessarily matter...
> If I'm reading the original post correctly, the biggest issue is > likely to be that the 14 disks on each 2Gbit fibre channel will be > throttled to 200Mb/s by the channel , when in fact you could expect > (in RAID 10 > arrangement) to get about 7 * 70 Mb/s = 490 Mb/s. > The two controllers and two FC switches/hubs are intended for redundancy, rather than performance, so there's only one 2Gbit channel. I > don't know if its possible to use both in parallel to get better performance. > I believe it's possible to join two or more FC ports on the switch together, but as there's only port going to the controller internally this presumably wouldn't help. > There are two SCSI U320 buses, with seven bays on each. I don't know what the overhead of SCSI is, but you're obviously not going to get > 490MB/s for each set of seven even if the FC could do it. Darn. I was really looking forward to ~500Mb/s :( > Of course your database may not spend all day doing sequential scans one at a time over 14 disks, so it doesn't necessarily matter... That's probably true, but *knowing* that the max seq scan speed is that high gives you some confidence (true or fake) that the hardware will be sufficient the next 2 years or so. So, if dual 2GBit FC:s still don't deliver more than 200Mb/s, what does? -Mikael
Mikael Carneholm wrote: > >> There are two SCSI U320 buses, with seven bays on each. I don't know > what the overhead of SCSI is, but you're obviously not going to get > > 490MB/s for each set of seven even if the FC could do it. > You should be able to get close to 300Mb/s on each SCSI bus - provided the PCI bus on the motherboard is 64-bit and runs at 133Mhz or better (64-bit and 66Mhz give you a 524Mb/s limit). > >> Of course your database may not spend all day doing sequential scans > one at a time over 14 disks, so it doesn't necessarily matter... > Yeah, it depends on the intended workload, but at some point most databases end up IO bound... so you really want to ensure the IO system is as capable as possible IMHO. > > That's probably true, but *knowing* that the max seq scan speed is that > high gives you some confidence (true or fake) that the hardware will be > sufficient the next 2 years or so. So, if dual 2GBit FC:s still don't > deliver more than 200Mb/s, what does? > Most modern PCI-X or PCIe RAID cards will do better than 200Mb/s (e.g. 3Ware 9550SX will do ~800Mb/s). By way of comparison my old PIII with a Promise TX4000 plus 4 IDE drives will do 215Mb/s...so being throttled to 200Mb/s on modern hardware seems unwise to me. Cheers Mark
On Tue, 25 Apr 2006, Mark Kirkwood wrote: > Mikael Carneholm wrote: > > > > >> There are two SCSI U320 buses, with seven bays on each. I don't know > > what the overhead of SCSI is, but you're obviously not going to get > > > 490MB/s for each set of seven even if the FC could do it. > > > > You should be able to get close to 300Mb/s on each SCSI bus - provided > the PCI bus on the motherboard is 64-bit and runs at 133Mhz or better > (64-bit and 66Mhz give you a 524Mb/s limit). I've no idea if the MSA1500's controllers use PCI internally. Obviously this argument applies to the PCI bus you plug your FC adapters in to, though. AIUI it's difficult to get PCI to actually give you it's theoretical maximum bandwidth. Those speeds are still a lot more than 200MB/s, though. > >> Of course your database may not spend all day doing sequential scans > > one at a time over 14 disks, so it doesn't necessarily matter... > > > > Yeah, it depends on the intended workload, but at some point most > databases end up IO bound... so you really want to ensure the IO system > is as capable as possible IMHO. IO bound doesn't imply IO bandwidth bound. 14 disks doing a 1ms seek followed by an 8k read over and over again is a bit over 100MB/s. Adding in write activity would make a difference, too, since it'd have to go to at least two disks. There are presumably hot spares, too. I still wouldn't really want to be limited to 200MB/s if I expected to use a full set of 14 disks for active database data where utmost performance really matters and where there may be some sequential scans going on, though. > > That's probably true, but *knowing* that the max seq scan speed is that > > high gives you some confidence (true or fake) that the hardware will be > > sufficient the next 2 years or so. So, if dual 2GBit FC:s still don't > > deliver more than 200Mb/s, what does? > > > > Most modern PCI-X or PCIe RAID cards will do better than 200Mb/s (e.g. > 3Ware 9550SX will do ~800Mb/s). > > By way of comparison my old PIII with a Promise TX4000 plus 4 IDE drives > will do 215Mb/s...so being throttled to 200Mb/s on modern hardware seems > unwise to me. Though, of course, these won't do many of the things you can do with a SAN - like connect several computers, or split a single array in to two pieces and have two computers access them as if they were separate drives, or remotely shut down one database machine and then start up another using the same disks and data. The number of IO operations per second they can do is likely to be important, too...possibly more important. There's 4GB FC, and so presumably 4GB SANs, but that's still not vast bandwidth. Using multiple FC ports is the other obvious way to do it with a SAN. I haven't looked, but I suspect you'll need quite a budget to get that...
Alex Hayward wrote: > > IO bound doesn't imply IO bandwidth bound. 14 disks doing a 1ms seek > followed by an 8k read over and over again is a bit over 100MB/s. Adding > in write activity would make a difference, too, since it'd have to go to > at least two disks. There are presumably hot spares, too. > Very true - if your workload is primarily random, ~100Mb/s may be enough bandwidth. > I still wouldn't really want to be limited to 200MB/s if I expected to use > a full set of 14 disks for active database data where utmost performance > really matters and where there may be some sequential scans going on, > though. > Yeah - thats the rub, Data mining, bulk loads, batch updates, backups (restores....) often use significant bandwidth. > Though, of course, these won't do many of the things you can do with a SAN > - like connect several computers, or split a single array in to two pieces > and have two computers access them as if they were separate drives, or > remotely shut down one database machine and then start up another using > the same disks and data. The number of IO operations per second they can > do is likely to be important, too...possibly more important. > SAN flexibility is nice (when it works as advertised), the cost and performance however, are the main detractors. On that note I don't recall IO/s being anything special on most SAN gear I've seen (this could have changed for later products I guess). > There's 4GB FC, and so presumably 4GB SANs, but that's still not vast > bandwidth. Using multiple FC ports is the other obvious way to do it with > a SAN. I haven't looked, but I suspect you'll need quite a budget to get > that... > Yes - the last place I worked were looking at doing this ('multiple attachment' was the buzz word I think) - I recall it needed special (read extra expensive) switches and particular cards... Cheers Mark
I'd be interested in those numbers once you get them, especially for ext3. We just picked up an HP MSA1500cs with the MSA50 sled, and I am curious as to how best to configure it for Postgres. My server is the HP DL585 (quad, dual-core Opteron, 16GB RAM) with 4 HD bays run by a HP SmartArray 5i controller. I have 15 10K 300GB drives and 1 15K 150GB drive (don't ask how that happened). The database is going to be very datawarehouse-ish (bulk loads, lots of queries) and has the potential to grow very large (1+ TB). Plus, with that much data, actual backups won't be easy, so I'll be relying on RAID+watchfullness to keep me safe, at least through the prototype stages. How would/do you guys set up your MSA1x00 with 1 drive sled? RAID10 vs RAID5 across 10+ disks? Here's what I was thinking (ext3 across everything): Direct attached: 2x300GB RAID10 - OS + ETL staging area 2x300GB RAID10 - log + indexes MSA1500: 10x300GB RAID10 + 1x300GB hot spare - tablespace I'm not quite sure what to do with the 15K/150GB drive, since it is a singleton. I'm also planning on giving all the 256MB MSA1500 cache to reads, although I might change it for the batch loads to see if it speeds things up. Also, unfortunately, the MSA1500 only has a single SCSI bus, which could significantly impact throughput, but we got a discount, so hopefully we can get another bus module in the near future and pop it in. Any comments are appreciated, -Mike
>My server is the HP DL585 (quad, dual-core Opteron, 16GB RAM) with 4 HD bays run by a HP SmartArray 5i controller. I have 15 10K 300GB >drives and 1 15K 150GB drive (don't ask how that happened). Our server will be a DL385 (dual, dual-core Opteron, 16Gb RAM), and the 28 disks(10K 146Gb)in the MSA1500 will probably be set up in SAME configuration (Stripe All, Mirror Everything). Still to be decided though. I'll post both pgbench and BenchmarkSQL (http://sourceforge.net/projects/benchmarksql) results here as soon as we have the machine set up. OS+log(not WAL) will recide on directly attached disks, and all heavy reading+writing will be taken care of by the MSA. Not sure how much of the cache module that will be used for reads, but as our peak write load is quite high we'll probably use at least half of it for writes (good write performance is pretty much the key for the application in question) >How would/do you guys set up your MSA1x00 with 1 drive sled? RAID10 vs >RAID5 across 10+ disks? Since it's a datawarehouse type of application, you'd probably optimize for large storage capacity and read (rather than write) performance, and in that case I guess raid5 could be considered, at least. Depends very much on reliability requirements though - raid5 performs much worse than raid10 in degraded mode (one disk out). Here's an interesting read regarding raid5 vs raid10 (NOT very pro-raid5 :) ) http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt Regards, Mikael