Thread: SSD and RAID
Where I work we are starting to look at using SSDs for database server storage. Despite the higher per unit cost it is quite attractive to replace 6-8 SAS drives in RAID 10 by a pair of SSD in RAID 1 that will probably perform better and use less power. Which brings up the question of should it be a pair in RAID 1 or just a singe drive? Traditionally this would have been a no brainer "Of course you want RAID 1 or RAID 10"! However our experience with SSD failure modes points to firmware bugs as primary source of trouble - and these are likely to impact both drives (nearly) simultaneously in a RAID 1 configuration. Also the other major issue to watch - flash write limit exhaustion - is also likely to hit at the same time for a pair of drives in RAID 1. One option to get around the simultaneous firmware failure is to be to get 2 *similar* drives from different manufactures (e.g OCZ Vertex 32 and Intel 520 - both Sandforce but different firmware setup). However using different manufacturers drives is a pest (e.g different smart codes maintained and/or different meanings for the same codes) What are other folks who are using SSDs doing? Cheers Mark
Hi, a few personal opinions (your mileage may wary ...) On 5.3.2012 23:37, Mark Kirkwood wrote: > Where I work we are starting to look at using SSDs for database server > storage. Despite the higher per unit cost it is quite attractive to > replace 6-8 SAS drives in RAID 10 by a pair of SSD in RAID 1 that will > probably perform better and use less power. Probably -> depends on the workload. Have you performed any tests to check it will actually improve the performance? If large portion of your workload is sequential (e.g. seq. scans of large tables in DSS environment etc.) then SSDs are not worth the money. For OLTP workloads it's a clear winner. Anyway, don't get rid of the SAS drives completely - use them for WAL. This is written in sequential manner and if you use PITR then WAL is the most valuable piece of data (along with the base backup), so it's exactly the thing you want to place on reliable devices. And if you use a decent controller with a BBWC to absorb the fsync, then it can give as good performance as SSDs ... > Which brings up the question of should it be a pair in RAID 1 or just a > singe drive? Traditionally this would have been a no brainer "Of course > you want RAID 1 or RAID 10"! However our experience with SSD failure > modes points to firmware bugs as primary source of trouble - and these > are likely to impact both drives (nearly) simultaneously in a RAID 1 > configuration. Also the other major issue to watch - flash write limit > exhaustion - is also likely to hit at the same time for a pair of drives > in RAID 1. Yeah, matches my experience. Generally the same rules are valid for spinners too (use different batches / brands to build an array), but the firmware bugs are quite annoying. Using the SAS drives for WAL may actually help you here - do a base backup regularly and keep the WAL files so that you can do a recovery if the SSDs fail. You won't loose any data but it takes time to do the recovery. If you can't afford the downtime, you should setup a failover machine anyway. And AFAIK a standby does less writes than the master. At least that's what I'd do. But those are my personal oppinions - I suppose others may disagree. kind regards Tomas
On 2012-03-05 23:37, Mark Kirkwood wrote: > Which brings up the question of should it be a pair in RAID 1 or just > a singe drive? Traditionally this would have been a no brainer "Of > course you want RAID 1 or RAID 10"! However our experience with SSD > failure modes points to firmware bugs as primary source of trouble - > and these are likely to impact both drives (nearly) simultaneously in > a RAID 1 configuration. Also the other major issue to watch - flash > write limit exhaustion - is also likely to hit at the same time for a > pair of drives in RAID 1. > > What are other folks who are using SSDs doing? This is exactly the reason why in a set of new hardware I'm currently evaluating two different brands of manufacturers for the spindles (behind bbwc for wal, os, archives etc) and ssds (on mb sata ports). For the ssd's we've chosen the Intel 710 and OCZ Vertex 2 PRO, however that last one was EOL and OCZ offered to replace it by the Deneva 2 (http://www.oczenterprise.com/downloads/solutions/ocz-deneva2-r-mlc-2.5in_Product_Brief.pdf). Still waiting for a test Deneva though. One thing to note is that linux software raid with md doesn't support discard, which might shorten the drive's expected lifetime. To get some numbers I tested the raid 1 of ssd's setup for mediawear under a PostgreSQL load earlier, see http://archives.postgresql.org/pgsql-general/2011-11/msg00141.php <Greg Smith imitation mode on>I would recommended that for every ssd considered for production use, test the ssd with diskchecker.pl on a filesystem that's mounted the same as you would with your data (e.g. with xfs or ext4 with nobarrier), and also do a mediawear test like the one described in the linked pgsql-general threar above, especially if you're chosing to run on non-enterprise marketed ssds.</> regards, Yeb PS: we applied the same philosophy (different brands) also to motherboards, io controllers and memory, but after testing, we liked one IO controllers software so much more than the other so we chose to have only one. Also stream memory performance of one motherboard showed a significant performance regression in the higher thread counts that we decided to go for the other brand for all servers. -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On 2012-03-06 09:34, Andrea Suisani wrote: > On 03/06/2012 09:17 AM, Yeb Havinga wrote: >> >> PS: we applied the same philosophy (different brands) also to >> motherboards, io controllers and memory, but after testing, we liked >> one IO controllers software so much more than the other so we chose >> to have only one. Also stream memory performance of one motherboard >> showed a significant performance >> regression in the higher thread counts that we decided to go for the >> other brand for all servers. >> > > care to share motherboard winning model? > > thanks > Andrea > On http://i.imgur.com/vfmvu.png is a graph of three systems, made with the multi stream scaling (average of 10 tests if I remember correctly) test. The red and blue are 2 X 12 core opteron 6168 systems with 64 GB DDR3 1333MHz in 8GB dimms Red is a Tyan S8230 Blue is a Supermicro H8DGI-G We tried a lot of things to rule out motherboards, such as swap memory of both systems, ensure BIOS settings are similar (e.g. ECC mode), update to latest BIOS where possible, but none of those settings improved the memory performance drop. Both systems were installed with kickstarted Centos 6.2, so also no kernel setting differences there.. regards, Yeb -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On 06/03/12 21:17, Yeb Havinga wrote: > > > One thing to note is that linux software raid with md doesn't support > discard, which might shorten the drive's expected lifetime. To get > some numbers I tested the raid 1 of ssd's setup for mediawear under a > PostgreSQL load earlier, see > http://archives.postgresql.org/pgsql-general/2011-11/msg00141.php > > Right, which is a bit of a pain - we are considering either formatting the drive with less capacity and using md RAID 1 or else doing the mirror in LVM to enable a working discard/trim. Regards Mark
On 2012-03-07 01:36, Mark Kirkwood wrote: > On 06/03/12 21:17, Yeb Havinga wrote: >> >> >> One thing to note is that linux software raid with md doesn't support >> discard, which might shorten the drive's expected lifetime. To get >> some numbers I tested the raid 1 of ssd's setup for mediawear under a >> PostgreSQL load earlier, see >> http://archives.postgresql.org/pgsql-general/2011-11/msg00141.php >> >> > > Right, which is a bit of a pain - we are considering either formatting > the drive with less capacity and using md RAID 1 or else doing the > mirror in LVM to enable a working discard/trim. When I measured the write durability without discard on the enterprise disks, I got numbers that in normal production use would outlive the lifetime of the servers. It would be interesting to see durability numbers for the desktop SSDs, even when partitioned to a part of the disk. regards, Yeb Havinga
On 03/06/2012 09:17 AM, Yeb Havinga wrote: > On 2012-03-05 23:37, Mark Kirkwood wrote: >> Which brings up the question of should it be a pair in RAID 1 or just a singe drive? Traditionally this would have beena no brainer "Of course you want RAID 1 or RAID 10"! However our experience with SSD failure modes points to firmwarebugs as primary source of trouble - and these are likely to >> impact both drives (nearly) simultaneously in a RAID 1 configuration. Also the other major issue to watch - flash writelimit exhaustion - is also likely to hit at the same time for a pair of drives in RAID 1. >> >> What are other folks who are using SSDs doing? > > This is exactly the reason why in a set of new hardware I'm currently evaluating two different brands of manufacturersfor the spindles (behind bbwc for wal, os, archives etc) and ssds (on mb sata ports). For the ssd's we've chosenthe Intel 710 and OCZ Vertex 2 PRO, however that last one was EOL > and OCZ offered to replace it by the Deneva 2 (http://www.oczenterprise.com/downloads/solutions/ocz-deneva2-r-mlc-2.5in_Product_Brief.pdf).Still waiting for a test Denevathough. > > One thing to note is that linux software raid with md doesn't support discard, which might shorten the drive's expectedlifetime. To get some numbers I tested the raid 1 of ssd's setup for mediawear under a PostgreSQL load earlier, seehttp://archives.postgresql.org/pgsql-general/2011-11/msg00141.php > > <Greg Smith imitation mode on>I would recommended that for every ssd considered for production use, test the ssd with diskchecker.plon a filesystem that's mounted the same as you would with your data (e.g. with xfs or ext4 with nobarrier),and also do a mediawear test like the one described in the > linked pgsql-general threar above, especially if you're chosing to run on non-enterprise marketed ssds.</> > > regards, > Yeb > > PS: we applied the same philosophy (different brands) also to motherboards, io controllers and memory, but after testing,we liked one IO controllers software so much more than the other so we chose to have only one. Also stream memoryperformance of one motherboard showed a significant performance > regression in the higher thread counts that we decided to go for the other brand for all servers. > care to share motherboard winning model? thanks Andrea
On 03/06/2012 10:34 AM, Yeb Havinga wrote: > On 2012-03-06 09:34, Andrea Suisani wrote: >> On 03/06/2012 09:17 AM, Yeb Havinga wrote: >>> >>> PS: we applied the same philosophy (different brands) also to motherboards, io controllers and memory, but after testing,we liked one IO controllers software so much more than the other so we chose to have only one. Also stream memoryperformance of one motherboard showed a significant performance >>> regression in the higher thread counts that we decided to go for the other brand for all servers. >>> >> >> care to share motherboard winning model? >> >> thanks >> Andrea >> > > On http://i.imgur.com/vfmvu.png is a graph of three systems, made with the multi stream scaling (average of 10 tests ifI remember correctly) test. > > The red and blue are 2 X 12 core opteron 6168 systems with 64 GB DDR3 1333MHz in 8GB dimms > > Red is a Tyan S8230 > Blue is a Supermicro H8DGI-G > > We tried a lot of things to rule out motherboards, such as swap memory of both systems, ensure BIOS settings are similar(e.g. ECC mode), update to latest BIOS where possible, but none of those settings improved the memory performancedrop. Both systems were installed with kickstarted Centos 6.2, so > also no kernel setting differences there.. > > regards, > Yeb thanks for sharing those infos Andrea