Re: New server: SSD/RAID recommendations? - Mailing list pgsql-performance

From Graeme B. Bell
Subject Re: New server: SSD/RAID recommendations?
Date
Msg-id 52A08B87-2A8A-43F2-92CE-2514F5AD141B@skogoglandskap.no
Whole thread Raw
In response to Re: New server: SSD/RAID recommendations?  (Steve Crawford <scrawford@pinpointresearch.com>)
Responses Re: New server: SSD/RAID recommendations?  ("Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>)
List pgsql-performance
Completely agree with Steve.

1. Intel NVMe looks like the best bet if you have modern enough hardware for NVMe. Otherwise e.g. S3700 mentioned
elsewhere.

2. RAID controllers.

We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
This might give people idea about where the risk lies in the path from disk to CPU.

We've had 2 RAID card failures in the last 12 months that nuked the array with days of downtime, and 2 problems with
batteriessuddenly becoming useless or suddenly reporting wildly varying temperatures/overheating. There may have been
otherRAID problems I don't know about.  

Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I guess they have 100-200 disks?). We also
haveabout 25-30 Hitachi/HGST HDDs. 

So by my estimates:
30% annual problem rate with RAID controllers
30-50% failure rate with Seagate HDDs (backblaze saw similar results)
0% failure rate with HGST HDDs.
0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a bug in TRIM under linux, which I'll need to
investigateto see if we have been affected by).  

also, RAID controllers aren't free - not just the money but also the management of them (ever tried writing a complex
installscript that interacts work with MegaCLI? It can be done but it's not much fun.). Just take a look at the MegaCLI
manualand ask yourself... is this even worth it (if you have a good MTBF on an enterprise SSD). 

RAID was meant to be about ensuring availability of data. I have trouble believing that these days....

Graeme Bell


On 06 Jul 2015, at 18:56, Steve Crawford <scrawford@pinpointresearch.com> wrote:

>
> 2. We don't typically have redundant electronic components in our servers. Sure, we have dual power supplies and dual
NICs(though generally to handle external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no
backupRAID card. Intel Enterprise SSD already have power-fail protection so I don't need a RAID card to give me BBU.
Giventhe MTBF of good enterprise SSD I'm left to wonder if placing a RAID card in front merely adds a new point of
failureand scheduled-downtime-inducing hands-on maintenance (I'm looking at you, RAID backup battery). 



pgsql-performance by date:

Previous
From: "Mkrtchyan, Tigran"
Date:
Subject: Re: 9.5alpha1 vs 9.4
Next
From: "Mkrtchyan, Tigran"
Date:
Subject: Re: New server: SSD/RAID recommendations?