On Thu, 5 Apr 2007, Scott Marlowe wrote:
> On Thu, 2007-04-05 at 14:30, James Mansion wrote:
>> Can you cite any statistical evidence for this?
> Logic?
OK, everyone who hasn't already needs to read the Google and CMU papers.
I'll even provide links for you:
http://www.cs.cmu.edu/~bianca/fast07.pdf
http://labs.google.com/papers/disk_failures.pdf
There are several things their data suggests that are completely at odds
with the lore suggested by traditional logic-based thinking in this area.
Section 3.4 of Google's paper basically disproves that "mechanical devices
have decreasing MTBF when run in hotter environments" applies to hard
drives in the normal range they're operated in. Your comments about
server hard drives being rated to higher temperatures is helpful, but
conclusions drawn from just thinking about something I don't trust when
they conflict with statistics to the contrary.
I don't want to believe everything they suggest, but enough of it matches
my experience that I find it difficult to dismiss the rest. For example,
I scan all my drives for reallocated sectors, and the minute there's a
single one I get e-mailed about it and get all the data off that drive
pronto. This has saved me from a complete failure that happened within
the next day on multiple occasions.
The main thing I wish they'd published is breaking some of the statistics
down by drive manufacturer. For example, they suggest a significant
number of drive failures were not predicted by SMART. I've seen plenty of
drives where the SMART reporting was spotty at best (yes, I'm talking
about you, Maxtor) and wouldn't be surprised that they were quiet right up
to their bitter (and frequent) end. I'm not sure how that factor may have
skewed this particular bit of data.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD