raid writethrough mode (WT), ssds and your DB. (was Performances issues with SSD volume ?) - Mailing list pgsql-admin
From | Graeme B. Bell |
---|---|
Subject | raid writethrough mode (WT), ssds and your DB. (was Performances issues with SSD volume ?) |
Date | |
Msg-id | BA419F86-4BA4-4794-A2BE-BE3546F00F6A@skogoglandskap.no Whole thread Raw |
Responses |
Re: raid writethrough mode (WT), ssds and your DB. (was
Performances issues with SSD volume ?)
|
List | pgsql-admin |
> Not using your raid controllers write cache then? Not sure just how important that is with SSDs these days, but if you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. I did quite a few tests with WB and WT last year. - WT should be OK with e.g. Intel SSDs. From memory I saw write performance gains of about 20-30% with Crucial M500/M550writes on a Dell H710 RAID controller. BUT that controller didn't have WT fastpath though which is absolutely essentialto see substantial gains WT. I expect with WT and a fastpath enabled RAID you'd see much higher numbers, e.g. 100%+higher IOPS. (So, if you don't have fastpath on your controller, you might as well plan to leave WB on and just buy cheaper SSD drivesrather than expensive ones - the raid controller will be your choke point for performance on WT and it's a source ofrisk). - WT with most SSDs will likely corrupt your postgres database the first time you lose power. (on all the drives I've tested) - WB is the only safe option unless you have done lots of plug pull tests on a drive that is guaranteed to protect data "inflight" during power loss (Intel disks + maybe the new samsung pcie). A relevant anecdatum... A certain company makes ssd drives, for talking sake let's call two of their models the XY00 and the XY50. These were popularSSD drives that were advertised everywhere as having power loss protection throughout 2013-2014. We bought them lotsof them here because of that 'power loss protection' aspect + a good price + performance + good reliability record +the good name of the company. When I tested with the famous 'diskchecker.pl' tool (* link at end), I found that they don't actually provide full powerloss protection. Some data in flight (even fsyncs!) was lost. I tested using several computers, several copies of each disk model, with "XY00" and "XY50" models, and with and withoutRAID controllers. The only way I could keep the data safe for fsyncs and DB use with these drives during power failure was either a) use aRAID controller with WB or b) disable ssd cache, which is horrifyingly bad for performance. So I wrote to the company's engineering in early August 2014 about this (because we had spent quite a lot of money on thesedisks) and corresponded with a QA engineer to show them my results and show them how to reproduce the data loss problem,asking if maybe they could produce a firmware patch or some other fix. At first they were extremely interested to know more. Then once they had the information to fully reproduce the bug, theywent silent and wouldn't reply to any emails. About 1-2 months later, articles started appearing on enthusiast tech sites. Not new firmware, just company product repsexplaining that "power loss protection" doesn't really mean all your data is protected from power loss, and that it'sunreasonable to expect the drive to do what it says on the box. :-( Lessons to takeaway: --- WT + many SSDs + power loss = likely DB corruption. --- No raid card + many SSDs + power loss = likely DB corruption. --- WB + many SSDs + power loss = should be fine but you must test it a few times. --- Never use WT mode on any production system until you've run a ton of tests on the drive's ability to honor fsyncs. --- Never trust any vendor to provide correctly working equipment regardless of how often they make promises in advertising.Buy the smallest amount possible and test it first yourself in the most realistic environment possible. Thatgoes for RAID controllers advertised as having fastpath , which actually didn't, and ssds heavily advertised as havingpower loss protection, which actually didn't protect all data from power loss. --- oh and NEVER do a power loss test by holding the power button. On every machine I've tested with SSDs, a power buttonshutdown (e.g. hold power for 5 seconds till it turns off) did not create lost data, whereas a plug pull test (yankthe power out of the power supply) always produced lost data. The plug pull test reproduces a real life power failuremore accurately. The power button test will only give you an illusion of safety. Graeme Bell p.s. https://gist.github.com/bradfitz/3172656 - diskchecker.pl
pgsql-admin by date: