Re: SCSI vs SATA - Mailing list pgsql-performance

From Ron
Subject Re: SCSI vs SATA
Date
Msg-id E1HaLYH-0003c6-1Y@elasmtp-masked.atl.sa.earthlink.net
Whole thread Raw
In response to Re: SCSI vs SATA  (david@lang.hm)
Responses Re: SCSI vs SATA
Re: SCSI vs SATA
List pgsql-performance
At 05:42 PM 4/7/2007, david@lang.hm wrote:
>On Sat, 7 Apr 2007, Ron wrote:
>
>>The reality is that all modern HDs are so good that it's actually
>>quite rare for someone to suffer a data loss event.  The
>>consequences of such are so severe that the event stands out more
>>than just the statistics would imply. For those using small numbers
>>of HDs, HDs just work.
>>
>>OTOH, for those of us doing work that involves DBMSs and relatively
>>large numbers of HDs per system, both the math and the RW
>>conditions of service require us to pay more attention to quality details.
>>Like many things, one can decide on one of multiple ways to "pay the piper".
>>
>>a= The choice made by many, for instance in the studies mentioned,
>>is to minimize initial acquisition cost and operating overhead and
>>simply accept having to replace HDs more often.
>>
>>b= For those in fields were this is not a reasonable option
>>(financial services, health care, etc), or for those literally
>>using 100's of HD per system (where statistical failure rates are
>>so likely that TLC is required), policies and procedures like those
>>mentioned in this thread (paying close attention to environment and
>>use factors, sector remap detecting, rotating HDs into and out of
>>roles based on age, etc) are necessary.
>>
>>Anyone who does some close variation of "b" directly above =will=
>>see the benefits of using better HDs.
>>
>>At least in my supposedly unqualified anecdotal 25 years of
>>professional experience.
>
>Ron, why is it that you assume that anyone who disagrees with you
>doesn't work in an environment where they care about the datacenter
>environment, and aren't in fields like financial services? and why
>do you think that we are just trying to save a few pennies? (the
>costs do factor in, but it's not a matter of pennies, it's a matter
>of tens of thousands of dollars)
I don't assume that.  I didn't make any assumptions.  I (rightfully
IMHO) criticized everyone jumping on the "See, cheap =is= good!"
bandwagon that the Google and CMU studies seem to have ignited w/o
thinking critically about them.
I've never mentioned or discussed specific financial amounts, so
you're making an (erroneous) assumption when you think my concern is
over people "trying to save a few pennies".

In fact, "saving pennies" is at the =bottom= of my priority list for
the class of applications I've been discussing.  I'm all for
economical, but to paraphrase Einstein "Things should be as cheap as
possible; but no cheaper."

My biggest concern is that something I've seen over and over again in
my career will happen again:
People tend to jump at the _slightest_ excuse to believe a story that
will save them short term money and resist even _strong_ reasons to
pay up front for quality.  Even if paying more up front would lower
their lifetime TCO.

The Google and CMU studies are =not= based on data drawn from
businesses where the lesser consequences of an outage are losing
$10Ks or $100K per minute... ...and where the greater consequences
include the chance of loss of human life.
Nor are they based on businesses that must rely exclusively on highly
skilled and therefore expensive labor.

In the case of the CMU study, people are even extrapolating an
economic conclusion the original author did not even make or intend!
Is it any wonder I'm expressing concern regarding inappropriate
extrapolation of those studies?


>I actually work in the financial services field, I do have a good
>datacenter environment that's well cared for.
>
>while I don't personally maintain machines with hundreds of drives
>each, I do maintain hundreds of machines with a small number of
>drives in each, and a handful of machines with a few dozens of
>drives. (the database machines are maintained by others, I do see
>their failed drives however)
>
>it's also true that my expericance is only over the last 10 years,
>so I've only been working with a few generations of drives, but my
>experiance is different from yours.
>
>my experiance is that until the drives get to be 5+ years old the
>failure rate seems to be about the same for the 'cheap' drives as
>for the 'good' drives. I won't say that they are exactly the same,
>but they are close enough that I don't believe that there is a
>significant difference.
>
>in other words, these studies do seem to match my experiance.
Fine.  Let's pretend =You= get to build Citibank's or Humana's next
mission critical production DBMS using exclusively HDs with 1 year warranties.
(never would be allowed ITRW)

Even if you RAID 6 them, I'll bet you anything that a system with 32+
HDs on it is likely enough to spend a high enough percentage of its
time operating in degraded mode that you are likely to be looking for
a job as a consequence of such a decision.
...and if you actually suffer data loss or, worse, data corruption,
that's a Career Killing Move.
(and it should be given the likely consequences to the public of such a F* up).


>this is why, when I recently had to create some large capacity
>arrays, I'm only ending up with machines with a few dozen drives in
>them instead of hundreds. I've got two machines with 6TB of disk,
>one with 8TB, one with 10TB, and one with 20TB. I'm building these
>sytems for ~$1K/TB for the disk arrays. other departments sho shoose
>$bigname 'enterprise' disk arrays are routinely paying 50x that price
>
>I am very sure that they are not getting 50x the reliability, I'm
>sure that they aren't getting 2x the reliability.
...and I'm very sure they are being gouged mercilessly by vendors who
are padding their profit margins exorbitantly at the customer's expense.
HDs or memory from the likes of EMC, HP, IBM, or Sun has been
overpriced for decades.
Unfortunately, for every one of me who shop around for good vendors
there are 20+ corporate buyers who keep on letting themselves get gouged.
Gouging is not going stop until the gouge prices are unacceptable to
enough buyers.

Now if the issue of price difference is based on =I/O interface= (SAS
vs SATA vs FC vs SCSI), that's a different, and orthogonal, issue.
The simple fact is that optical interconnects are far more expensive
than anything else and that SCSI electronics cost significantly more
than anything except FC.
There's gouging here as well, but far more of the pricing is justified.



>I believe that the biggest cause for data loss from people useing
>the 'cheap' drives is due to the fact that one 'cheap' drive holds
>the capacity of 5 or so 'expensive' drives, and since people don't
>realize this they don't realize that the time to rebuild the failed
>drive onto a hot-spare is correspondingly longer.
Commodity HDs get 1 year warranties for the same reason enterprise
HDs get 5+ year warranties: the vendor's confidence that they are not
going to lose money honoring the warranty in question.

AFAIK, there is no correlation between capacity of HDs and failure
rates or warranties on them.


Your point regarding using 2 cheaper systems in parallel instead of 1
gold plated system is in fact an expression of a basic Axiom of
Systems Theory with regards to Single Points of Failure.  Once
components become cheap enough, it is almost always better to have
redundancy rather than all one's eggs in 1 heavily protected basket.


Frankly, the only thing that made me feel combative is when someone
claimed there's no difference between anecdotal evidence and a
professional opinion or advice.
That's just so utterly unrealistic as to defy belief.
No one would ever get anything done if every business decision had to
wait on properly designed and executed lab studies.

It's also insulting to everyone who puts in the time and effort to be
a professional within a field rather than a lay person.

Whether there's a name for it or not, there's definitely an important
distinction between each of anecdote, professional opinion, and study result.


Cheers,
Ron Peacetree






pgsql-performance by date:

Previous
From: david@lang.hm
Date:
Subject: Re: SCSI vs SATA
Next
From: david@lang.hm
Date:
Subject: Re: SCSI vs SATA