Re: SCSI vs SATA - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: SCSI vs SATA
Date
Msg-id 1175872847.9839.146.camel@state.g2switchworks.com
Whole thread Raw
In response to Re: SCSI vs SATA  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: SCSI vs SATA
Re: SCSI vs SATA
Re: SCSI vs SATA
List pgsql-performance
On Thu, 2007-04-05 at 23:37, Greg Smith wrote:
> On Thu, 5 Apr 2007, Scott Marlowe wrote:
>
> > On Thu, 2007-04-05 at 14:30, James Mansion wrote:
> >> Can you cite any statistical evidence for this?
> > Logic?
>
> OK, everyone who hasn't already needs to read the Google and CMU papers.
> I'll even provide links for you:
>
> http://www.cs.cmu.edu/~bianca/fast07.pdf
> http://labs.google.com/papers/disk_failures.pdf
>
> There are several things their data suggests that are completely at odds
> with the lore suggested by traditional logic-based thinking in this area.
> Section 3.4 of Google's paper basically disproves that "mechanical devices
> have decreasing MTBF when run in hotter environments" applies to hard
> drives in the normal range they're operated in.

On the google:

The google study ONLY looked at consumer grade drives.  It did not
compare them to server class drives.

This is only true when the temperature is fairly low.  Note that the
drive temperatures in the google study are <=55C.  If the drive temp is
below 55C, then the environment, by extension, must be lower than that
by some fair bit, likely 10-15C, since the drive is a heat source, and
the environment the heat sink.  So, the environment here is likely in
the 35C range.

Most server drives are rated for 55-60C environmental temperature
operation, which means the drive would be even hotter.

As for the CMU study:

It didn't expressly compare server to consumer grade hard drives.
Remember, there are server class SATA drives, and there were (once upon
a time) consumer class SCSI drives.  If they had separated out the
drives by server / consumer grade I think the study would have been more
interesting.  But we just don't know from that study.

Personal Experience:

In my last job we had three very large storage arrays (big black
refrigerator looking boxes, you know the kind.)  Each one had somewhere
in the range of 150 or so drives in it.  The first two we purchased were
based on 9Gig server class SCSI drives.  The third, and newer one, was
based on commodity IDE drives.  I'm not sure of the size, but I believe
they were somewhere around 20Gigs or so.  So, this was 5 or so years
ago, not recently.

We had a cooling failure in our hosting center, and the internal
temperature of the data center rose to about 110F to 120F (43C to 48C).
We ran at that temperature for about 12 hours, before we got a
refrigerator on a flatbed brought in (btw, I highly recommend Aggreko if
you need large scale portable air conditioners or generators) to cool
things down.

In the months that followed the drives in the IDE based storage array
failed by the dozens.  We eventually replaced ALL the drives in that
storage array because of the failure rate.  The SCSI based arrays had a
few extra drives fail than usual, but nothing too shocking.

Now, maybe now Seagate et. al. are making their consumer grade drives
from yesterday's server grade technology, but 5 or 6 years ago that was
not the case from what I saw.

> Your comments about
> server hard drives being rated to higher temperatures is helpful, but
> conclusions drawn from just thinking about something I don't trust when
> they conflict with statistics to the contrary.

Actually, as I looked up some more data on this, I found it interesting
that 5 to 10 years ago, consumer grade drives were rated for 35C
environments, while today consumer grade drives seem to be rated to 55C
or 60C.  Same as server drives were 5 to 10 years ago.  I do think that
server grade drive tech has been migrating into the consumer realm over
time.  I can imagine that today's high performance game / home systems
with their heat generating video cards and tendency towards RAID1 /
RAID0 drive setups are pushing the drive manufacturers to improve
reliability of consumer disk drives.

> The main thing I wish they'd published is breaking some of the statistics
> down by drive manufacturer.  For example, they suggest a significant
> number of drive failures were not predicted by SMART.  I've seen plenty of
> drives where the SMART reporting was spotty at best (yes, I'm talking
> about you, Maxtor) and wouldn't be surprised that they were quiet right up
> to their bitter (and frequent) end.  I'm not sure how that factor may have
> skewed this particular bit of data.

I too have pretty much given up on Maxtor drives and things like SMART
or sleep mode, or just plain working properly.

In recent months, we had an AC unit fail here at work, and we have two
drive manufacturers for our servers.  Manufacturer F and S.  The drives
from F failed at a much higher rate, and developed lots and lots of bad
sectors, the drives from manufacturer S, OTOH, have not had an increased
failure rate.  While both manufacturers claim that their drives can
survive in an environment of 55/60C, I'm pretty sure one of them was
lying.  We are silently replacing the failed drives with drives from
manufacturer S.

Based on experience I think that on average server drives are more
reliable than consumer grade drives, and can take more punishment.  But,
the variables of manufacturer, model, and the batch often make even more
difference than grade.

pgsql-performance by date:

Previous
From: "Jonathan Ellis"
Date:
Subject: Re: Premature view materialization in 8.2?
Next
From: "John Allgood"
Date:
Subject: Re: High Load on Postgres 7.4.16 Server