raid writethrough mode (WT), ssds and your DB. (was Performances issues with SSD volume ?) - Mailing list pgsql-admin

From Graeme B. Bell
Subject raid writethrough mode (WT), ssds and your DB. (was Performances issues with SSD volume ?)
Date
Msg-id BA419F86-4BA4-4794-A2BE-BE3546F00F6A@skogoglandskap.no
Whole thread Raw
Responses Re: raid writethrough mode (WT), ssds and your DB. (was Performances issues with SSD volume ?)
List pgsql-admin
> Not using your raid controllers write cache then?  Not sure just how important that is with SSDs these days, but if
you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. 

I did quite a few tests with WB and WT last year.

- WT should be OK with e.g. Intel SSDs.    From memory I saw write performance gains of about 20-30% with Crucial
M500/M550writes on a Dell H710 RAID controller. BUT that controller didn't have WT fastpath though which is absolutely
essentialto see substantial gains WT. I expect with WT and a fastpath enabled RAID you'd see much higher numbers, e.g.
100%+higher IOPS.  

(So, if you don't have fastpath on your controller, you might as well plan to leave WB on and just buy cheaper SSD
drivesrather than expensive ones - the raid controller will be your choke point for performance on WT and it's a source
ofrisk).  

- WT with most SSDs will likely corrupt your postgres database the first time you lose power. (on all the drives I've
tested)

- WB is the only safe option unless you have done lots of plug pull tests on a drive that is guaranteed to protect data
"inflight" during power loss (Intel disks +  maybe the new samsung pcie).  


A relevant anecdatum...


A certain company makes ssd drives, for talking sake let's call two of their models the XY00 and the XY50. These were
popularSSD drives that were advertised everywhere as having power loss protection throughout 2013-2014. We bought them
lotsof them here because of that 'power loss protection' aspect + a good price + performance + good reliability record
+the good name of the company. 

When I tested with the famous 'diskchecker.pl' tool (* link at end), I found that they don't actually provide full
powerloss protection. Some data in flight (even fsyncs!) was lost.  
I tested using several computers, several copies of each disk model, with "XY00" and "XY50" models, and with and
withoutRAID controllers.  

The only way I could keep the data safe for fsyncs and DB use with these drives during power failure was either a) use
aRAID controller with WB   or b) disable ssd cache, which is horrifyingly bad for performance. 

So I wrote to the company's engineering in early August 2014 about this (because we had spent quite a lot of money on
thesedisks) and corresponded with a QA engineer to show them my results and show them how to reproduce the data loss
problem,asking if maybe they could produce a firmware patch or some other fix. 

At first they were extremely interested to know more. Then once they had the information to fully reproduce the bug,
theywent silent and wouldn't reply to any emails. 

About 1-2 months later, articles started appearing on enthusiast tech sites. Not new firmware, just company product
repsexplaining that "power loss protection" doesn't really mean all your data is protected from power loss, and that
it'sunreasonable to expect the drive to do what it says on the box.  

:-(


Lessons to takeaway:

--- WT + many SSDs + power loss = likely DB corruption.

--- No raid card + many SSDs + power loss = likely DB corruption.

--- WB + many SSDs + power loss = should be fine but you must test it a few times.

--- Never use WT mode on any production system until you've run a ton of tests on the drive's ability to honor fsyncs.

--- Never trust any vendor to provide correctly working equipment regardless of how often they make promises in
advertising.Buy the smallest amount possible and test it first yourself in the most realistic environment possible.
Thatgoes for RAID controllers advertised as having fastpath , which actually didn't, and ssds heavily advertised as
havingpower loss protection, which actually didn't protect all data from power loss.  

--- oh and NEVER do a power loss test by holding the power button. On every machine I've tested with SSDs, a power
buttonshutdown (e.g. hold power for 5 seconds till it turns off) did not create lost data, whereas a plug pull test
(yankthe power out of the power supply) always produced lost data. The plug pull test reproduces a real life power
failuremore accurately. The power button test will only give you an illusion of safety.  

Graeme Bell



p.s. https://gist.github.com/bradfitz/3172656      - diskchecker.pl





pgsql-admin by date:

Previous
From: Mario Aguado
Date:
Subject: Re: File removed while backup is running
Next
From: Thomas SIMON
Date:
Subject: Re: Performances issues with SSD volume ?