Thread: SCSI vs SATA

From:
"jason@ohloh.net"
Date:

We need to upgrade a postgres server. I'm not tied to these specific
alternatives, but I'm curious to get feedback on their general
qualities.

SCSI
   dual xeon 5120, 8GB ECC
   8*73GB SCSI 15k drives (PERC 5/i)
   (dell poweredge 2900)

SATA
   dual opteron 275, 8GB ECC
   24*320GB SATA II 7.2k drives (2*12way 3ware cards)
   (generic vendor)

Both boxes are about $8k running ubuntu. We're planning to setup with
raid10. Our main requirement is highest TPS (focused on a lot of
INSERTS).

Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?

-jay

From:
Ron
Date:

For random IO, the 3ware cards are better than PERC

 > Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II drives?

Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K screamers.

Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
The 15K's are 2x faster rpm, but they are only ~23% the density =>
advantage per HD to SATAs.
Then there's the fact that there are 1.5x as many 7.2K spindles as
15K spindles...

Unless your transactions are very small and unbuffered / unscheduled
(in which case you are in a =lot= of trouble), The SATA set-up rates
to be ~2x - ~3x faster ITRW than the SCSI set-up.

Cheers,
Ron Peacetree


At 06:13 PM 4/3/2007,  wrote:
>We need to upgrade a postgres server. I'm not tied to these specific
>alternatives, but I'm curious to get feedback on their general
>qualities.
>
>SCSI
>   dual xeon 5120, 8GB ECC
>   8*73GB SCSI 15k drives (PERC 5/i)
>   (dell poweredge 2900)
>
>SATA
>   dual opteron 275, 8GB ECC
>   24*320GB SATA II 7.2k drives (2*12way 3ware cards)
>   (generic vendor)
>
>Both boxes are about $8k running ubuntu. We're planning to setup with
>raid10. Our main requirement is highest TPS (focused on a lot of
>INSERTS).
>
>Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?
>
>-jay
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org


From:
Ron
Date:

At 07:07 PM 4/3/2007, Ron wrote:
>For random IO, the 3ware cards are better than PERC
>
> > Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA
> II drives?
>
>Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K screamers.
>
>Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
>The 15K's are 2x faster rpm, but they are only ~23% the density =>
>advantage per HD to SATAs.
>Then there's the fact that there are 1.5x as many 7.2K spindles as
>15K spindles...
Oops make that =3x= as many 7.2K spindles as 15K spindles...


>Unless your transactions are very small and unbuffered / unscheduled
>(in which case you are in a =lot= of trouble), The SATA set-up rates
>to be ~2x - ~3x faster ITRW than the SCSI set-up.
...which makes this imply that the SATA set-up given will be ~4x -
~6x faster ITRW than the SCSI set-up given.

>Cheers,
>Ron Peacetree
>
>
>At 06:13 PM 4/3/2007,  wrote:
>>We need to upgrade a postgres server. I'm not tied to these specific
>>alternatives, but I'm curious to get feedback on their general
>>qualities.
>>
>>SCSI
>>   dual xeon 5120, 8GB ECC
>>   8*73GB SCSI 15k drives (PERC 5/i)
>>   (dell poweredge 2900)
>>
>>SATA
>>   dual opteron 275, 8GB ECC
>>   24*320GB SATA II 7.2k drives (2*12way 3ware cards)
>>   (generic vendor)
>>
>>Both boxes are about $8k running ubuntu. We're planning to setup with
>>raid10. Our main requirement is highest TPS (focused on a lot of
>>INSERTS).
>>
>>Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?
>>
>>-jay
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 4: Have you searched our list archives?
>>
>>               http://archives.postgresql.org


From:
Geoff Tolley
Date:

Ron wrote:
> At 07:07 PM 4/3/2007, Ron wrote:
>> For random IO, the 3ware cards are better than PERC
>>
>> > Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II
>> drives?
>>
>> Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K
>> screamers.
>>
>> Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
>> The 15K's are 2x faster rpm, but they are only ~23% the density =>
>> advantage per HD to SATAs.
>> Then there's the fact that there are 1.5x as many 7.2K spindles as 15K
>> spindles...
> Oops make that =3x= as many 7.2K spindles as 15K spindles...

I don't think the density difference will be quite as high as you seem to
think: most 320GB SATA drives are going to be 3-4 platters, the most that a
73GB SCSI is going to have is 2, and more likely 1, which would make the
SCSIs more like 50% the density of the SATAs. Note that this only really
makes a difference to theoretical sequential speeds; if the seeks are
random the SCSI drives could easily get there 50% faster (lower rotational
latency and they certainly will have better actuators for the heads).
Individual 15K SCSIs will trounce 7.2K SATAs in terms of i/os per second.

What I always do when examining hard drive options is to see if they've
been tested (or a similar model has) at http://www.storagereview.com/ -
they have a great database there with lots of low-level information
(although it seems to be down at the time of writing).

But what's likely to make the largest difference in the OP's case (many
inserts) is write caching, and a battery-backed cache would be needed for
this. This will help mask write latency differences between the two
options, and so benefit SATA more. Some 3ware cards offer it, some don't,
so check the model.

How the drives are arranged is going to be important too - one big RAID 10
is going to be rather worse than having arrays dedicated to each of
pg_xlog, indices and tables, and on that front the SATA option is going to
grant more flexibility.

If you care about how often you'll have to replace a failed drive, then the
SCSI option no question, although check the cases for hot-swapability.

HTH,
Geoff

From:
"Brian A. Seklecki"
Date:

You might also ask on:



People are pretty candid there.

~BAS

On Tue, 2007-04-03 at 15:13 -0700,  wrote:
> Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?
--
Brian A. Seklecki <>
Collaborative Fusion, Inc.


From:
david@lang.hm
Date:

On Tue, 3 Apr 2007, Geoff Tolley wrote:

>
> Ron wrote:
>>  At 07:07 PM 4/3/2007, Ron wrote:
>> >  For random IO, the 3ware cards are better than PERC
>> >
>> > >  Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II
>> >  drives?
>> >
>> >  Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K
>> >  screamers.
>> >
>> >  Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
>> >  The 15K's are 2x faster rpm, but they are only ~23% the density =>
>> >  advantage per HD to SATAs.
>> >  Then there's the fact that there are 1.5x as many 7.2K spindles as 15K
>> >  spindles...
>>  Oops make that =3x= as many 7.2K spindles as 15K spindles...
>
> I don't think the density difference will be quite as high as you seem to
> think: most 320GB SATA drives are going to be 3-4 platters, the most that a
> 73GB SCSI is going to have is 2, and more likely 1, which would make the
> SCSIs more like 50% the density of the SATAs. Note that this only really
> makes a difference to theoretical sequential speeds; if the seeks are random
> the SCSI drives could easily get there 50% faster (lower rotational latency
> and they certainly will have better actuators for the heads). Individual 15K
> SCSIs will trounce 7.2K SATAs in terms of i/os per second.

true, but with 3x as many drives (and 4x the capacity per drive) the SATA
system will have to do far less seeking

for that matter, with 20ish 320G drives, how large would a parition be
that only used the outer pysical track of each drive? (almost certinly
multiple logical tracks) if you took the time to set this up you could
eliminate seeking entirely (at the cost of not useing your capacity, but
since you are considering a 12x range in capacity, it's obviously not your
primary concern)

> If you care about how often you'll have to replace a failed drive, then the
> SCSI option no question, although check the cases for hot-swapability.

note that the CMU and Google studies both commented on being surprised at
the lack of difference between the reliability of SCSI and SATA drives.

David Lang

From:
Ron Mayer
Date:

 wrote:
>   8*73GB SCSI 15k ...(dell poweredge 2900)...
>   24*320GB SATA II 7.2k ...(generic vendor)...
>
> raid10. Our main requirement is highest TPS (focused on a lot of INSERTS).
> Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?

It's worth asking the vendors in question if you can test the configurations
before you buy.   Of course with 'generic vendor' it's easiest if that
vendor has local offices.

If Dell hesitates, mention that their competitors offer such programs;
some by loaning you the servers[1], others by having performance testing
centers where you can (for a fee?) come in and benchmark your applications[2].


[1] http://www.sun.com/tryandbuy/
[2] http://www.hp.com/products1/solutioncenters/services/index.html#solution

From:
"Peter Kovacs"
Date:

This may be a silly question but: will not 3 times as many disk drives
mean 3 times higher probability for disk failure? Also rumor has it
that SATA drives are more prone to fail than SCSI drivers. More
failures will result, in turn, in more administration costs.

Thanks
Peter

On 4/4/07,  <> wrote:
> On Tue, 3 Apr 2007, Geoff Tolley wrote:
>
> >
> > Ron wrote:
> >>  At 07:07 PM 4/3/2007, Ron wrote:
> >> >  For random IO, the 3ware cards are better than PERC
> >> >
> >> > >  Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II
> >> >  drives?
> >> >
> >> >  Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K
> >> >  screamers.
> >> >
> >> >  Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
> >> >  The 15K's are 2x faster rpm, but they are only ~23% the density =>
> >> >  advantage per HD to SATAs.
> >> >  Then there's the fact that there are 1.5x as many 7.2K spindles as 15K
> >> >  spindles...
> >>  Oops make that =3x= as many 7.2K spindles as 15K spindles...
> >
> > I don't think the density difference will be quite as high as you seem to
> > think: most 320GB SATA drives are going to be 3-4 platters, the most that a
> > 73GB SCSI is going to have is 2, and more likely 1, which would make the
> > SCSIs more like 50% the density of the SATAs. Note that this only really
> > makes a difference to theoretical sequential speeds; if the seeks are random
> > the SCSI drives could easily get there 50% faster (lower rotational latency
> > and they certainly will have better actuators for the heads). Individual 15K
> > SCSIs will trounce 7.2K SATAs in terms of i/os per second.
>
> true, but with 3x as many drives (and 4x the capacity per drive) the SATA
> system will have to do far less seeking
>
> for that matter, with 20ish 320G drives, how large would a parition be
> that only used the outer pysical track of each drive? (almost certinly
> multiple logical tracks) if you took the time to set this up you could
> eliminate seeking entirely (at the cost of not useing your capacity, but
> since you are considering a 12x range in capacity, it's obviously not your
> primary concern)
>
> > If you care about how often you'll have to replace a failed drive, then the
> > SCSI option no question, although check the cases for hot-swapability.
>
> note that the CMU and Google studies both commented on being surprised at
> the lack of difference between the reliability of SCSI and SATA drives.
>
> David Lang
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to  so that your
>        message can get through to the mailing list cleanly
>

From:
Andreas Kostyrka
Date:

* Peter Kovacs <> [070404 14:40]:
> This may be a silly question but: will not 3 times as many disk drives
> mean 3 times higher probability for disk failure? Also rumor has it
> that SATA drives are more prone to fail than SCSI drivers. More
> failures will result, in turn, in more administration costs.
Actually, the newest research papers show that all discs (be it
desktops, or highend SCSI) have basically the same failure statistics.

But yes, having 3 times the discs will increase the fault probability.

Andreas
>
> Thanks
> Peter
>
> On 4/4/07,  <> wrote:
> >On Tue, 3 Apr 2007, Geoff Tolley wrote:
> >
> >>
> >> Ron wrote:
> >>>  At 07:07 PM 4/3/2007, Ron wrote:
> >>> >  For random IO, the 3ware cards are better than PERC
> >>> >
> >>> > >  Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II
> >>> >  drives?
> >>> >
> >>> >  Nope.  Not even if the 15K 73GB HDs were the brand new Savvio 15K
> >>> >  screamers.
> >>> >
> >>> >  Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB
> >>> >  The 15K's are 2x faster rpm, but they are only ~23% the density =>
> >>> >  advantage per HD to SATAs.
> >>> >  Then there's the fact that there are 1.5x as many 7.2K spindles as 15K
> >>> >  spindles...
> >>>  Oops make that =3x= as many 7.2K spindles as 15K spindles...
> >>
> >> I don't think the density difference will be quite as high as you seem to
> >> think: most 320GB SATA drives are going to be 3-4 platters, the most that a
> >> 73GB SCSI is going to have is 2, and more likely 1, which would make the
> >> SCSIs more like 50% the density of the SATAs. Note that this only really
> >> makes a difference to theoretical sequential speeds; if the seeks are random
> >> the SCSI drives could easily get there 50% faster (lower rotational latency
> >> and they certainly will have better actuators for the heads). Individual 15K
> >> SCSIs will trounce 7.2K SATAs in terms of i/os per second.
> >
> >true, but with 3x as many drives (and 4x the capacity per drive) the SATA
> >system will have to do far less seeking
> >
> >for that matter, with 20ish 320G drives, how large would a parition be
> >that only used the outer pysical track of each drive? (almost certinly
> >multiple logical tracks) if you took the time to set this up you could
> >eliminate seeking entirely (at the cost of not useing your capacity, but
> >since you are considering a 12x range in capacity, it's obviously not your
> >primary concern)
> >
> >> If you care about how often you'll have to replace a failed drive, then the
> >> SCSI option no question, although check the cases for hot-swapability.
> >
> >note that the CMU and Google studies both commented on being surprised at
> >the lack of difference between the reliability of SCSI and SATA drives.
> >
> >David Lang
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 1: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to  so that your
> >       message can get through to the mailing list cleanly
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq

From:
Alvaro Herrera
Date:

Andreas Kostyrka escribió:
> * Peter Kovacs <> [070404 14:40]:
> > This may be a silly question but: will not 3 times as many disk drives
> > mean 3 times higher probability for disk failure? Also rumor has it
> > that SATA drives are more prone to fail than SCSI drivers. More
> > failures will result, in turn, in more administration costs.
> Actually, the newest research papers show that all discs (be it
> desktops, or highend SCSI) have basically the same failure statistics.
>
> But yes, having 3 times the discs will increase the fault probability.

... of individual disks, which is quite different from failure of a disk
array (in case there is one).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:
"Peter Kovacs"
Date:

But if an individual disk fails in a disk array, sooner than later you
would want to purchase a new fitting disk, walk/drive to the location
of the disk array, replace the broken disk in the array and activate
the new disk. Is this correct?

Thanks
Peter

On 4/4/07, Alvaro Herrera <> wrote:
> Andreas Kostyrka escribió:
> > * Peter Kovacs <> [070404 14:40]:
> > > This may be a silly question but: will not 3 times as many disk drives
> > > mean 3 times higher probability for disk failure? Also rumor has it
> > > that SATA drives are more prone to fail than SCSI drivers. More
> > > failures will result, in turn, in more administration costs.
> > Actually, the newest research papers show that all discs (be it
> > desktops, or highend SCSI) have basically the same failure statistics.
> >
> > But yes, having 3 times the discs will increase the fault probability.
>
> ... of individual disks, which is quite different from failure of a disk
> array (in case there is one).
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>

From:
Alvaro Herrera
Date:

Peter Kovacs escribió:
> But if an individual disk fails in a disk array, sooner than later you
> would want to purchase a new fitting disk, walk/drive to the location
> of the disk array, replace the broken disk in the array and activate
> the new disk. Is this correct?

Ideally you would have a spare disk to let the array controller replace
the broken one as soon as it breaks, but yeah, that would be more or
less the procedure.  There is a way to defer the walk/drive until a more
convenient opportunity presents.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:
Rod Taylor
Date:

On 4-Apr-07, at 8:46 AM, Andreas Kostyrka wrote:

> * Peter Kovacs <> [070404 14:40]:
>> This may be a silly question but: will not 3 times as many disk
>> drives
>> mean 3 times higher probability for disk failure? Also rumor has it
>> that SATA drives are more prone to fail than SCSI drivers. More
>> failures will result, in turn, in more administration costs.
> Actually, the newest research papers show that all discs (be it
> desktops, or highend SCSI) have basically the same failure statistics.
>
> But yes, having 3 times the discs will increase the fault probability.

I highly recommend RAID6 to anyone with more than 6 standard SATA
drives in a single array. It's actually fairly probable that you will
lose 2 drives in a 72 hour window (say over a long weekend) at some
point.

> Andreas
>>
>> Thanks
>> Peter
>>
>> On 4/4/07,  <> wrote:
>>> On Tue, 3 Apr 2007, Geoff Tolley wrote:
>>>
>>>>
>>>> Ron wrote:
>>>>>  At 07:07 PM 4/3/2007, Ron wrote:
>>>>>>  For random IO, the 3ware cards are better than PERC
>>>>>>
>>>>>>>  Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB
>>>>>>> SATA II
>>>>>>  drives?
>>>>>>
>>>>>>  Nope.  Not even if the 15K 73GB HDs were the brand new Savvio
>>>>>> 15K
>>>>>>  screamers.
>>>>>>
>>>>>>  Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12
>>>>>> 7.2K 320GB
>>>>>>  The 15K's are 2x faster rpm, but they are only ~23% the
>>>>>> density =>
>>>>>>  advantage per HD to SATAs.
>>>>>>  Then there's the fact that there are 1.5x as many 7.2K
>>>>>> spindles as 15K
>>>>>>  spindles...
>>>>>  Oops make that =3x= as many 7.2K spindles as 15K spindles...
>>>>
>>>> I don't think the density difference will be quite as high as
>>>> you seem to
>>>> think: most 320GB SATA drives are going to be 3-4 platters, the
>>>> most that a
>>>> 73GB SCSI is going to have is 2, and more likely 1, which would
>>>> make the
>>>> SCSIs more like 50% the density of the SATAs. Note that this
>>>> only really
>>>> makes a difference to theoretical sequential speeds; if the
>>>> seeks are random
>>>> the SCSI drives could easily get there 50% faster (lower
>>>> rotational latency
>>>> and they certainly will have better actuators for the heads).
>>>> Individual 15K
>>>> SCSIs will trounce 7.2K SATAs in terms of i/os per second.
>>>
>>> true, but with 3x as many drives (and 4x the capacity per drive)
>>> the SATA
>>> system will have to do far less seeking
>>>
>>> for that matter, with 20ish 320G drives, how large would a
>>> parition be
>>> that only used the outer pysical track of each drive? (almost
>>> certinly
>>> multiple logical tracks) if you took the time to set this up you
>>> could
>>> eliminate seeking entirely (at the cost of not useing your
>>> capacity, but
>>> since you are considering a 12x range in capacity, it's obviously
>>> not your
>>> primary concern)
>>>
>>>> If you care about how often you'll have to replace a failed
>>>> drive, then the
>>>> SCSI option no question, although check the cases for hot-
>>>> swapability.
>>>
>>> note that the CMU and Google studies both commented on being
>>> surprised at
>>> the lack of difference between the reliability of SCSI and SATA
>>> drives.
>>>
>>> David Lang
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>       subscribe-nomail command to  so
>>> that your
>>>       message can get through to the mailing list cleanly
>>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


From:
Andreas Kostyrka
Date:

* Alvaro Herrera <> [070404 15:42]:
> Peter Kovacs escribió:
> > But if an individual disk fails in a disk array, sooner than later you
> > would want to purchase a new fitting disk, walk/drive to the location
> > of the disk array, replace the broken disk in the array and activate
> > the new disk. Is this correct?
>
> Ideally you would have a spare disk to let the array controller replace
> the broken one as soon as it breaks, but yeah, that would be more or
Well, no matter what, you need to test this procedure. I'd expect in
many cases the disc io during the rebuild of the array to that much
slower that the database server won't be able to cope with the load.

Andreas

From:
"jason@ohloh.net"
Date:

On Apr 3, 2007, at 6:54 PM, Geoff Tolley wrote:

> I don't think the density difference will be quite as high as you
> seem to think: most 320GB SATA drives are going to be 3-4 platters,
> the most that a 73GB SCSI is going to have is 2, and more likely 1,
> which would make the SCSIs more like 50% the density of the SATAs.
> Note that this only really makes a difference to theoretical
> sequential speeds; if the seeks are random the SCSI drives could
> easily get there 50% faster (lower rotational latency and they
> certainly will have better actuators for the heads). Individual 15K
> SCSIs will trounce 7.2K SATAs in terms of i/os per second.

Good point. On another note, I am wondering why nobody's brought up
the command-queuing perf benefits (yet). Is this because sata vs scsi
are at par here? I'm finding conflicting information on this -- some
calling sata's ncq mostly crap, others stating the real-world results
are negligible. I'm inclined to believe SCSI's pretty far ahead here
but am having trouble finding recent articles on this.

> What I always do when examining hard drive options is to see if
> they've been tested (or a similar model has) at http://
> www.storagereview.com/ - they have a great database there with lots
> of low-level information (although it seems to be down at the time
> of writing).

Still down! They might want to get better drives... j/k.

> But what's likely to make the largest difference in the OP's case
> (many inserts) is write caching, and a battery-backed cache would
> be needed for this. This will help mask write latency differences
> between the two options, and so benefit SATA more. Some 3ware cards
> offer it, some don't, so check the model.

The servers are hooked up to a reliable UPS. The battery-backed cache
won't hurt but might be overkill (?).

> How the drives are arranged is going to be important too - one big
> RAID 10 is going to be rather worse than having arrays dedicated to
> each of pg_xlog, indices and tables, and on that front the SATA
> option is going to grant more flexibility.

I've read some recent contrary advice. Specifically advising the
sharing of all files (pg_xlogs, indices, etc..) on a huge raid array
and letting the drives load balance by brute force. I know the
postgresql documentation claims up to 13% more perf for moving the
pg_xlog to its own device(s) -- but by sharing everything on a huge
array you lose a small amount of perf (when compared to the
theoretically optimal solution) - vs being significantly off optimal
perf if you partition your tables/files wrongly. I'm willing to do
reasonable benchmarking but time is money -- and reconfiguring huge
arrays in multiple configurations to get possibly get incremental
perf might not be as cost efficient as just spending more on hardware.

Thanks for all the tips.

From:
Ron
Date:

At 07:16 AM 4/4/2007, Peter Kovacs wrote:
>This may be a silly question but: will not 3 times as many disk drives
>mean 3 times higher probability for disk failure?

Yes, all other factors being equal 3x more HDs (24 vs 8) means ~3x
the chance of any specific HD failing.

OTOH, either of these numbers is probably smaller than you think.
Assuming a  HD with a 1M hour MTBF (which means that at 1M hours of
operation you have a ~1/2 chance of that specific HD failing), the
instantaneous reliability of any given HD is

x^(1M)= 1/2, (1M)lg(x)= lg(1/2), lg(x)= lg(1/2)/(1M), lg(x)= ~
-1/(1M), x= ~.999999307

To evaluate the instantaneous reliability of a set of "n" HDs, we
raise x to the power of that number of HDs.
Whether we evaluate x^8= .999994456 or x^24= .999983368, the result
is still darn close to 1.

Multiple studies have shown that ITRW modern components considered to
be critical like HDs, CPUs, RAM, etc fail far less often than say
fans and PSUs.

In addition, those same studies show HDs are usually
a= set up to be redundant (RAID) and
b= hot swap-able
c= usually do not catastrophically fail with no warning (unlike fans and PSUs)

Finally, catastrophic failures of HDs are infinitesimally rare
compared to things like fans.

If your system is in appropriate conditions and suffers a system
stopping HW failure, the odds are it will not be a HD that failed.
Buy HDs with 5+ year warranties + keep them in appropriate
environments and the odds are very good that you will never have to
complain about your HD subsystem.


>  Also rumor has it that SATA drives are more prone to fail than
> SCSI drivers. More
>failures will result, in turn, in more administration costs.
Hard data trumps rumors.  The hard data is that you should only buy
HDs with 5+ year warranties and then make sure to use them only in
appropriate conditions and under appropriate loads.

Respect those constraints and the numbers say the difference in
reliability between SCSI, SATA, and SAS HDs is negligible.

Cheers,
Ron Peacetree


From:
"Joshua D. Drake"
Date:

>
> Good point. On another note, I am wondering why nobody's brought up the
> command-queuing perf benefits (yet). Is this because sata vs scsi are at

SATAII has similar features.

> par here? I'm finding conflicting information on this -- some calling
> sata's ncq mostly crap, others stating the real-world results are
> negligible. I'm inclined to believe SCSI's pretty far ahead here but am
> having trouble finding recent articles on this.

What I find is, a bunch of geeks sit in a room and squabble about a few
percentages one way or the other. One side feels very l33t because their
white paper looks like the latest swimsuit edition.

Real world specs and real world performance shows that SATAII performs,
very, very well. It is kind of like X86. No chip engineer that I know
has ever said, X86 is elegant but guess which chip design is conquering
all others in the general and enterprise marketplace?

SATAII brute forces itself through some of its performance, for example
16MB write cache on each drive.

Sincerely,

Joshua D. Drake
--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
Stefan Kaltenbrunner
Date:

Joshua D. Drake wrote:
>
>>
>> Good point. On another note, I am wondering why nobody's brought up
>> the command-queuing perf benefits (yet). Is this because sata vs scsi
>> are at
>
> SATAII has similar features.
>
>> par here? I'm finding conflicting information on this -- some calling
>> sata's ncq mostly crap, others stating the real-world results are
>> negligible. I'm inclined to believe SCSI's pretty far ahead here but
>> am having trouble finding recent articles on this.
>
> What I find is, a bunch of geeks sit in a room and squabble about a few
> percentages one way or the other. One side feels very l33t because their
> white paper looks like the latest swimsuit edition.
>
> Real world specs and real world performance shows that SATAII performs,
> very, very well. It is kind of like X86. No chip engineer that I know
> has ever said, X86 is elegant but guess which chip design is conquering
> all others in the general and enterprise marketplace?
>
> SATAII brute forces itself through some of its performance, for example
> 16MB write cache on each drive.

sure but for any serious usage one either wants to disable that
cache(and rely on tagged command queuing or how that is called in SATAII
world) or rely on the OS/raidcontroller implementing some sort of
FUA/write barrier feature(which linux for example only does in pretty
recent kernels)


Stefan

From:
"Joshua D. Drake"
Date:

>> SATAII brute forces itself through some of its performance, for
>> example 16MB write cache on each drive.
>
> sure but for any serious usage one either wants to disable that
> cache(and rely on tagged command queuing or how that is called in SATAII

Why? Assuming we have a BBU, why would you turn off the cache?

> world) or rely on the OS/raidcontroller implementing some sort of
> FUA/write barrier feature(which linux for example only does in pretty
> recent kernels)

Sincerely,

Joshua D. Drake

>
>
> Stefan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
Andreas Kostyrka
Date:

* Joshua D. Drake <> [070404 17:40]:
>
> >Good point. On another note, I am wondering why nobody's brought up the command-queuing perf benefits (yet). Is this
becausesata vs scsi are at  
>
> SATAII has similar features.
>
> >par here? I'm finding conflicting information on this -- some calling sata's ncq mostly crap, others stating the
real-worldresults are negligible. I'm inclined to believe SCSI's  
> >pretty far ahead here but am having trouble finding recent articles on this.
>
> What I find is, a bunch of geeks sit in a room and squabble about a few percentages one way or the other. One side
feelsvery l33t because their white paper looks like the latest  
> swimsuit edition.
>
> Real world specs and real world performance shows that SATAII performs, very, very well. It is kind of like X86. No
chipengineer that I know has ever said, X86 is elegant but guess 
> which chip design is conquering all others in the general and enterprise marketplace?

Actually, to second that, we did have very similiar servers with
SCSI/SATA drives, and I did not notice any relevant measurable
difference. OTOH, the SCSI discs were way less reliable than the SATA
discs, that might have been bad luck.

Andreas

From:
"Joshua D. Drake"
Date:

> difference. OTOH, the SCSI discs were way less reliable than the SATA
> discs, that might have been bad luck.

Probably bad luck. I find that SCSI is very reliable, but I don't find
it any more reliable than SATA. That is assuming correct ventilation etc...

Sincerely,

Joshua D. Drake


>
> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
david@lang.hm
Date:

On Wed, 4 Apr 2007, Peter Kovacs wrote:

> But if an individual disk fails in a disk array, sooner than later you
> would want to purchase a new fitting disk, walk/drive to the location
> of the disk array, replace the broken disk in the array and activate
> the new disk. Is this correct?


correct, but more drives also give you the chance to do multiple parity
arrays so that you can loose more drives before you loose data. see the
tread titled 'Sunfire X4500 recommendations' for some stats on how likely
you are to loose your data in the face of multiple drive failures.

you can actually get much better reliability then RAID 10

David Lang

From:
Stefan Kaltenbrunner
Date:

Joshua D. Drake wrote:
>
>>> SATAII brute forces itself through some of its performance, for
>>> example 16MB write cache on each drive.
>>
>> sure but for any serious usage one either wants to disable that
>> cache(and rely on tagged command queuing or how that is called in SATAII
>
> Why? Assuming we have a BBU, why would you turn off the cache?

the BBU is usually only protecting the memory of the (hardware) raid
controller not the one in the drive ...


Stefan

From:
"Craig A. James"
Date:

I had a 'scratch' database for testing, which I deleted, and then disk went out.  No problem, no precious data.  But
nowI can't drop the tablespace, or the user who had that as the default tablespace. 

I thought about removing the tablespace from pg_tablespaces, but it seems wrong to be monkeying with the system tables.
I still can't drop the user, and can't drop the tablespace.  What's the right way to clear out Postgres when a disk
failsand there's no reason to repair the disk? 

Thanks,
Craig

From:
Tom Lane
Date:

"Craig A. James" <> writes:
> I had a 'scratch' database for testing, which I deleted, and then disk went out.  No problem, no precious data.  But
nowI can't drop the tablespace, or the user who had that as the default tablespace. 
> I thought about removing the tablespace from pg_tablespaces, but it seems wrong to be monkeying with the system
tables. I still can't drop the user, and can't drop the tablespace.  What's the right way to clear out Postgres when a
diskfails and there's no reason to repair the disk? 

Probably best to make a dummy postgres-owned directory somewhere and
repoint the symlink at it, then DROP TABLESPACE.

CVS HEAD has recently been tweaked to be more forgiving of such cases...

            regards, tom lane

From:
mark@mark.mielke.cc
Date:

On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote:
> >difference. OTOH, the SCSI discs were way less reliable than the SATA
> >discs, that might have been bad luck.
> Probably bad luck. I find that SCSI is very reliable, but I don't find
> it any more reliable than SATA. That is assuming correct ventilation etc...

Perhaps a basic question - but why does the interface matter? :-)

I find the subject interesting to read about - but I am having trouble
understanding why SATAII is technically superior or inferior to SCSI as
an interface, in any place that counts.

Is the opinion being expressed that manufacturers who have decided to
move to SATAII are not designing for the enterprise market yes? I find
myself doubting this...

Cheers,
mark

--
 /  /      __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


From:
Geoff Tolley
Date:

 wrote:

> Good point. On another note, I am wondering why nobody's brought up
> the command-queuing perf benefits (yet). Is this because sata vs scsi
> are at par here? I'm finding conflicting information on this -- some
> calling sata's ncq mostly crap, others stating the real-world results
> are negligible. I'm inclined to believe SCSI's pretty far ahead here
> but am having trouble finding recent articles on this.

My personal thoughts are that the SATA NCQ opinion you've found is simply
because the workloads SATAs tend to be given (single-user) don't really
benefit that much from it.

> The servers are hooked up to a reliable UPS. The battery-backed cache
> won't hurt but might be overkill (?).

The difference is that a BBU isn't going to be affected by OS/hardware
hangs. There are even some SCSI RAID cards I've seen which can save your
data in case the card itself fails (the BBU in these cases is part of the
same module as the write cache memory, so you can remove them together and
put them into a new card, after which the data can be written).

I haven't checked into this recently, but IDE drives are notorious for
lying about having their internal write cache disabled. Which means that in
theory a BBU controller can have a write acknowledged as having happened,
consequently purge the data from the write cache, then when the power fails
the data still isn't on any kind of permanent storage. It depends how
paranoid you are as to whether you care about this edge case (and it'd make
rather less difference if the pg_xlog is on a non-lying drive).

HTH,
Geoff

From:
Geoff Tolley
Date:

 wrote:

> for that matter, with 20ish 320G drives, how large would a parition be
> that only used the outer pysical track of each drive? (almost certinly
> multiple logical tracks) if you took the time to set this up you could
> eliminate seeking entirely (at the cost of not useing your capacity, but
> since you are considering a 12x range in capacity, it's obviously not
> your primary concern)

Good point: if 8x73GB in a RAID10 is an option, the database can't be
larger than 292GB, or 1/12 the available space on the 320GB SATA version.

> note that the CMU and Google studies both commented on being surprised
> at the lack of difference between the reliability of SCSI and SATA drives.

I'd read about the Google study's conclusions on the failure rate over time
of drives; I hadn't gotten wind before of it comparing SCSI to SATA drives.
I do wonder what their access patterns are like, and how that pertains to
failure rates. I'd like to think that with smaller seeks (like in the
many-big-SATAs-option) the life of the drives would be longer.

Oh, one big advantage of SATA over SCSI: simple cabling and no need for
termination. Although SAS levels that particular playing field.

Cheers,
Geoff

From:
Geoff Tolley
Date:

 wrote:

> Perhaps a basic question - but why does the interface matter? :-)

The interface itself matters not so much these days as the drives that
happen to use it. Most manufacturers make both SATA and SCSI lines, are
keen to keep the market segmented, and don't want to cannibalize their SCSI
business by coming out with any SATA drives that are too good. One notable
exception is Western Digital, which is why they remain the only makers of
10K SATAs more than three years after first coming out with them.

Cheers,
Geoff

From:
"James Mansion"
Date:

>sure but for any serious usage one either wants to disable that
>cache(and rely on tagged command queuing or how that is called in SATAII
>world) or rely on the OS/raidcontroller implementing some sort of
>FUA/write barrier feature(which linux for example only does in pretty
>recent kernels)

Does anyone know which other hosts have write barrier implementations?
Solaris?  FreeBSD? Windows?

The buffers should help greatly in such a case, right?  Particularly if
you have quite a wide stripe.

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007
12:48


From:
Arjen van der Meijden
Date:

On 4-4-2007 0:13  wrote:
> We need to upgrade a postgres server. I'm not tied to these specific
> alternatives, but I'm curious to get feedback on their general qualities.
>
> SCSI
>   dual xeon 5120, 8GB ECC
>   8*73GB SCSI 15k drives (PERC 5/i)
>   (dell poweredge 2900)

This is a SAS set-up, not SCSI. So the cabling, if an issue at all, is
in SAS' favour rather than SATA's. Normally you don't have to worry
about that in a hot-swap chassis anyway.

> SATA
>   dual opteron 275, 8GB ECC
>   24*320GB SATA II 7.2k drives (2*12way 3ware cards)
>   (generic vendor)
>
> Both boxes are about $8k running ubuntu. We're planning to setup with
> raid10. Our main requirement is highest TPS (focused on a lot of INSERTS).
>
> Question: will 8*15k SCSI drives outperform 24*7K SATA II drives?

I'm not sure this is an entirely fair question given the fact that the
systems aren't easily comparable. They are likely not the same build
quality or have the same kind of support, they occupy different amounts
of space (2U vs probably at least 4U or 5U) and there will probably a be
difference in energy consumption in favour of the first solution.
If you don't care about such things, it may actually be possible to
build a similar set-up as your SATA-system with 12 or 16 15k rpm SAS
disks or 10k WD Raptor disks. For the sata-solution you can also
consider a 24-port Areca card.


Best regards,

Arjen

From:
"jason@ohloh.net"
Date:

On Apr 4, 2007, at 12:09 PM, Arjen van der Meijden wrote:

> If you don't care about such things, it may actually be possible to
> build a similar set-up as your SATA-system with 12 or 16 15k rpm
> SAS disks or 10k WD Raptor disks. For the sata-solution you can
> also consider a 24-port Areca card.

fwiw, I've had horrible experiences with areca drivers on linux. I've
found them to be unreliable when used with dual AMD64 processors  4+
GB of ram. I've tried kernels 2.16 up to 2.19... intermittent yet
inevitable ext3 corruptions. 3ware cards, on the other hand, have
been rock solid.

-jay



From:
Arjen van der Meijden
Date:

On 4-4-2007 21:17  wrote:
> fwiw, I've had horrible experiences with areca drivers on linux. I've
> found them to be unreliable when used with dual AMD64 processors  4+ GB
> of ram. I've tried kernels 2.16 up to 2.19... intermittent yet
> inevitable ext3 corruptions. 3ware cards, on the other hand, have been
> rock solid.

That's the first time I hear such a thing. We have two systems (both are
previous generation 64bit Xeon systems with 6 and 8GB memory) which run
perfectly stable with uptimes with a ARC-1130 and 8 WD-raptor disks.

Best regards,

Arjen

From:
Bruce Momjian
Date:

 wrote:
> On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote:
> > >difference. OTOH, the SCSI discs were way less reliable than the SATA
> > >discs, that might have been bad luck.
> > Probably bad luck. I find that SCSI is very reliable, but I don't find
> > it any more reliable than SATA. That is assuming correct ventilation etc...
>
> Perhaps a basic question - but why does the interface matter? :-)
>
> I find the subject interesting to read about - but I am having trouble
> understanding why SATAII is technically superior or inferior to SCSI as
> an interface, in any place that counts.

You should probably read this to learn the difference between desktop
and enterprise-level drives:

  http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

--
  Bruce Momjian  <>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

From:
"Joshua D. Drake"
Date:

Bruce Momjian wrote:
>  wrote:
>> On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote:
>>>> difference. OTOH, the SCSI discs were way less reliable than the SATA
>>>> discs, that might have been bad luck.
>>> Probably bad luck. I find that SCSI is very reliable, but I don't find
>>> it any more reliable than SATA. That is assuming correct ventilation etc...
>> Perhaps a basic question - but why does the interface matter? :-)
>>
>> I find the subject interesting to read about - but I am having trouble
>> understanding why SATAII is technically superior or inferior to SCSI as
>> an interface, in any place that counts.
>
> You should probably read this to learn the difference between desktop
> and enterprise-level drives:
>
>   http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

Problem is :), you can purchase SATA Enterprise Drives.

Joshua D. Drake




--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
Bruce Momjian
Date:

Joshua D. Drake wrote:
> Bruce Momjian wrote:
> >  wrote:
> >> On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote:
> >>>> difference. OTOH, the SCSI discs were way less reliable than the SATA
> >>>> discs, that might have been bad luck.
> >>> Probably bad luck. I find that SCSI is very reliable, but I don't find
> >>> it any more reliable than SATA. That is assuming correct ventilation etc...
> >> Perhaps a basic question - but why does the interface matter? :-)
> >>
> >> I find the subject interesting to read about - but I am having trouble
> >> understanding why SATAII is technically superior or inferior to SCSI as
> >> an interface, in any place that counts.
> >
> > You should probably read this to learn the difference between desktop
> > and enterprise-level drives:
> >
> >   http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
>
> Problem is :), you can purchase SATA Enterprise Drives.

Right --- the point is not the interface, but whether the drive is built
for reliability or to hit a low price point.

--
  Bruce Momjian  <>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

From:
Carlos Moreno
Date:

> Problem is :), you can purchase SATA Enterprise Drives.

Problem????  I would have thought that was a good thing!!!   ;-)

Carlos
--


From:
"jason@ohloh.net"
Date:

In a perhaps fitting compromise, I have decide to go with a hybrid
solution:

8*73GB 15k SAS drives hooked up to Adaptec 4800SAS
PLUS
6*150GB SATA II drives hooked up to mobo (for now)

All wrapped in a 16bay 3U server. My reasoning is that the extra SATA
drives are practically free compared to the rest of the system (since
the mobo has 6 onboard connectors). I plan on putting the pg_xlog &
operating system on the sata drives and the tables/indices on the SAS
drives, although  I might not use the sata drives for the xlog if
they dont pan out perf-wise. I plan on getting the battery backed
module for the adaptec (72 hours of charge time).

Thanks to everyone for the valuable input. I hope i can do you all
proud with the setup and postgres.conf optimizations.

-jay


On Apr 4, 2007, at 1:48 PM, Carlos Moreno wrote:

>
>> Problem is :), you can purchase SATA Enterprise Drives.
>
> Problem????  I would have thought that was a good thing!!!   ;-)
>
> Carlos
> --
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate


From:
"James Mansion"
Date:

>Right --- the point is not the interface, but whether the drive is built
>for reliability or to hit a low price point.

Personally I take the marketing mublings about the enterprise drives
with a pinch of salt.  The low-price drives HAVE TO be reliable too,
because a non-negligible failure rate will result in returns processing
costs that destroy a very thin margin.

Granted, there was a move to very short warranties a while back,
but the trend has been for more realistic warranties again recently.
You can bet they don't do this unless the drives are generally pretty
good.

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007
12:48


From:
Tom Lane
Date:

"James Mansion" <> writes:
>> Right --- the point is not the interface, but whether the drive is built
>> for reliability or to hit a low price point.

> Personally I take the marketing mublings about the enterprise drives
> with a pinch of salt.  The low-price drives HAVE TO be reliable too,
> because a non-negligible failure rate will result in returns processing
> costs that destroy a very thin margin.

Reliability is relative.  Server-grade drives are built to be beat upon
24x7x365 for the length of their warranty period.  Consumer-grade drives
are built to be beat upon a few hours a day, a few days a week, for the
length of their warranty period.  Even if the warranties mention the
same number of years, there is a huge difference here.

            regards, tom lane

From:
Arjen van der Meijden
Date:

If the 3U case has a SAS-expander in its backplane (which it probably
has?) you should be able to connect all drives to the Adaptec
controller, depending on the casing's exact architecture etc. That's
another two advantages of SAS, you don't need a controller port for each
harddisk (we have a Dell MD1000 with 15 drives connected to a 4-port
external sas connection) and you can mix SAS and SATA drives on a
SAS-controller.

Best regards,

Arjen

On 5-4-2007 1:42  wrote:
> In a perhaps fitting compromise, I have decide to go with a hybrid
> solution:
>
> 8*73GB 15k SAS drives hooked up to Adaptec 4800SAS
> PLUS
> 6*150GB SATA II drives hooked up to mobo (for now)
>
> All wrapped in a 16bay 3U server. My reasoning is that the extra SATA
> drives are practically free compared to the rest of the system (since
> the mobo has 6 onboard connectors). I plan on putting the pg_xlog &
> operating system on the sata drives and the tables/indices on the SAS
> drives, although  I might not use the sata drives for the xlog if they
> dont pan out perf-wise. I plan on getting the battery backed module for
> the adaptec (72 hours of charge time).
>
> Thanks to everyone for the valuable input. I hope i can do you all proud
> with the setup and postgres.conf optimizations.
>
> -jay
>
>
> On Apr 4, 2007, at 1:48 PM, Carlos Moreno wrote:
>
>>
>>> Problem is :), you can purchase SATA Enterprise Drives.
>>
>> Problem????  I would have thought that was a good thing!!!   ;-)
>>
>> Carlos
>> --
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 7: You can help support the PostgreSQL project by donating at
>>
>>                http://www.postgresql.org/about/donate
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>

From:
Heikki Linnakangas
Date:

 wrote:
> In a perhaps fitting compromise, I have decide to go with a hybrid
> solution:
>
> 8*73GB 15k SAS drives hooked up to Adaptec 4800SAS
> PLUS
> 6*150GB SATA II drives hooked up to mobo (for now)
>
> All wrapped in a 16bay 3U server. My reasoning is that the extra SATA
> drives are practically free compared to the rest of the system (since
> the mobo has 6 onboard connectors). I plan on putting the pg_xlog &
> operating system on the sata drives and the tables/indices on the SAS
> drives, although  I might not use the sata drives for the xlog if they
> dont pan out perf-wise. I plan on getting the battery backed module for
> the adaptec (72 hours of charge time).

If you have an OLTP kind of workload, you'll want to have the xlog on
the drives with the battery backup module. The xlog needs to be fsync'd
every time you commit, and the battery backup will effectively eliminate
the delay that causes.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

From:
Ron
Date:

BE VERY WARY OF USING AN ADAPTEC RAID CONTROLLER!

IME, they are usually the worst of the commodity RAID controllers available.
I've often seen SW RAID outperform them.

If you are going to use this config, Tyan's n3600M (AKA S2932) MB has
a variant that comes with 8 SAS + 6 SATA II connectors.
The SKU is S2932WG2NR.
http://www.tyan.us/product_board_detail.aspx?pid=453

Be very careful to get this =exact= SKU if you order this board and
want the one with SAS support.
The non-SAS variant's SKU is S2932G2NR.  Note that the only
difference is the "W" in the middle.

Anyway, the onboard RAID is based on a LSI PCI-E controller.

I'm using this beast (Dual Socket F, AMD Barcelona ready, 16 DIMMS
supporting up to 64GB of ECC RAM, 2 PCI-Ex16 slots w/ PCI-Ex8 signalling, etc:
~$450 US w/o SAS, ~$500 US w/ SAS) for my most recent pg 8.2.3 build
on top of XFS.

If the on board RAID is or becomes inadequate to your needs, I'd
strongly suggest either 3ware or Areca RAID controllers.

Side Note:
What kind of HDs are the 8*73GB ones?  If they are the new 2.5"
Savvio 15Ks, be =VERY= careful about having proper power and cooling for them.
14 HD's in one case are going to have a serious transient load on
system start up and (especially with those SAS HDs) can generate a
great deal of heat.

What 16bay 3U server are you using?

Cheers,
Ron Peacetree

PS to all:  Tom's point about the difference between enterprise and
and non-enterprise HDs is dead on accurate.
Enterprise class HD's have case "clam shells" that are specifically
designed for 5 years of 24x7 operation is gangs of RAID under typical
conditions found in a reasonable NOC.
Consumer HDs are designed to be used in consumer boxes in "one's and
two's", and for far less time per day, and under far less punishment
during the time they are on.
There is a =big= difference between consumer class HDs and enterprise
class HDs even if they both have 5 year warranties.
Buy the right thing for your typical use case or risk losing company data.

Getting the wrong thing when it is your responsibility to get the
right thing is a firing offense if Something Bad happens to the data
because of it where I come from.


At 07:42 PM 4/4/2007,  wrote:
>In a perhaps fitting compromise, I have decide to go with a hybrid
>solution:
>
>8*73GB 15k SAS drives hooked up to Adaptec 4800SAS
>PLUS
>6*150GB SATA II drives hooked up to mobo (for now)
>
>All wrapped in a 16bay 3U server. My reasoning is that the extra SATA
>drives are practically free compared to the rest of the system (since
>the mobo has 6 onboard connectors). I plan on putting the pg_xlog &
>operating system on the sata drives and the tables/indices on the SAS
>drives, although  I might not use the sata drives for the xlog if
>they dont pan out perf-wise. I plan on getting the battery backed
>module for the adaptec (72 hours of charge time).
>
>Thanks to everyone for the valuable input. I hope i can do you all
>proud with the setup and postgres.conf optimizations.
>
>-jay
>
>
>On Apr 4, 2007, at 1:48 PM, Carlos Moreno wrote:
>
>>
>>>Problem is :), you can purchase SATA Enterprise Drives.
>>
>>Problem????  I would have thought that was a good thing!!!   ;-)
>>
>>Carlos
>>--
>>
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 7: You can help support the PostgreSQL project by donating at
>>
>>                http://www.postgresql.org/about/donate
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate


From:
Scott Marlowe
Date:

On Wed, 2007-04-04 at 09:12,  wrote:
> On Apr 3, 2007, at 6:54 PM, Geoff Tolley wrote:
>

> > But what's likely to make the largest difference in the OP's case
> > (many inserts) is write caching, and a battery-backed cache would
> > be needed for this. This will help mask write latency differences
> > between the two options, and so benefit SATA more. Some 3ware cards
> > offer it, some don't, so check the model.
>
> The servers are hooked up to a reliable UPS. The battery-backed cache
> won't hurt but might be overkill (?).

Just had to mention that the point of battery backed cache on the RAID
controller isn't the same as for a UPS on a system.

With drives that properly report fsync(), your system is limited to the
rpm of the drive( subsystem) that the pg_xlog sits upon.  With battery
backed cache, the controller immediately acknowledges an fsync() call
and then commits it at its leisure.  Should the power be lost, either
due to mains / UPS failure or internal power supply failure, the
controller hangs onto those data for several days, and upon restart
flushes them out to the drives they were heading for originally.

battery backed cache is the best way to get both good performance and
reliability from a system without breaking the bank.  I've seen 2 disk
RAID-1 setups with BBU beat some pretty big arrays that didn't have a
BBU on OLTP work.


> > How the drives are arranged is going to be important too - one big
> > RAID 10 is going to be rather worse than having arrays dedicated to
> > each of pg_xlog, indices and tables, and on that front the SATA
> > option is going to grant more flexibility.
>
> I've read some recent contrary advice. Specifically advising the
> sharing of all files (pg_xlogs, indices, etc..) on a huge raid array
> and letting the drives load balance by brute force.

The other, at first almost counter-intuitive result was that putting
pg_xlog on a different partition on the same array (i.e. one big
physical partition broken up into multiple logical ones) because the OS
overhead of writing all the data to one file system caused performance
issues.  Can't remember who reported the performance increase of the top
of my head.

Note that a lot of the advantages to running on multiple arrays etc...
are somewhat negated by having a good RAID controller with a BBU.

From:
Scott Marlowe
Date:

On Thu, 2007-04-05 at 00:32, Tom Lane wrote:
> "James Mansion" <> writes:
> >> Right --- the point is not the interface, but whether the drive is built
> >> for reliability or to hit a low price point.
>
> > Personally I take the marketing mublings about the enterprise drives
> > with a pinch of salt.  The low-price drives HAVE TO be reliable too,
> > because a non-negligible failure rate will result in returns processing
> > costs that destroy a very thin margin.
>
> Reliability is relative.  Server-grade drives are built to be beat upon
> 24x7x365 for the length of their warranty period.  Consumer-grade drives
> are built to be beat upon a few hours a day, a few days a week, for the
> length of their warranty period.  Even if the warranties mention the
> same number of years, there is a huge difference here.

Just a couple of points...

Server drives are generally more tolerant of higher temperatures.  I.e.
the failure rate for consumer and server class HDs may be about the same
at 40 degrees C, but by the time the internal case temps get up to 60-70
degrees C, the consumer grade drives will likely be failing at a much
higher rate, whether they're working hard or not.

Which brings up my next point:

I'd rather have 36 consumer grade drives in a case that moves a LOT of
air and keeps the drive bays cool than 12 server class drives in a case
that has mediocre / poor air flow in it.  I would, however, allocate 3
or 4 drives as spares in the 36 drive array just to be sure.

Last point:

As has been mentioned in this thread already, not all server drives are
created equal.  Anyone who lived through the HP Surestore 2000 debacle
or one like it can attest to that.  Until the drives have been burnt in
and proven reliable, just assume that they could all fail at any time
and act accordingly.

From:
Jeff Frost
Date:

On Thu, 5 Apr 2007, Scott Marlowe wrote:

>> I've read some recent contrary advice. Specifically advising the
>> sharing of all files (pg_xlogs, indices, etc..) on a huge raid array
>> and letting the drives load balance by brute force.
>
> The other, at first almost counter-intuitive result was that putting
> pg_xlog on a different partition on the same array (i.e. one big
> physical partition broken up into multiple logical ones) because the OS
> overhead of writing all the data to one file system caused performance
> issues.  Can't remember who reported the performance increase of the top
> of my head.

I noticed this behavior on the last Areca based 8 disk Raptor system I built.
Putting pg_xlog on a separate partition on the same logical volume was faster
than putting it on the large volume.  It was also faster to have 8xRAID10 for
OS+data+pg_xlog vs 6xRAID10 for data and 2xRAID1 for pg_xlog+OS.  Your
workload may vary, but it's definitely worth testing.  The system in question
had 1GB BBU.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"jason@ohloh.net"
Date:

On Apr 5, 2007, at 8:21 AM, Jeff Frost wrote:

> I noticed this behavior on the last Areca based 8 disk Raptor
> system I built. Putting pg_xlog on a separate partition on the same
> logical volume was faster than putting it on the large volume.  It
> was also faster to have 8xRAID10 for OS+data+pg_xlog vs 6xRAID10
> for data and 2xRAID1 for pg_xlog+OS.  Your workload may vary, but
> it's definitely worth testing.  The system in question had 1GB BBU.

Thanks for sharing your findings - I'll definitely try that config out.

-jay



From:
"jason@ohloh.net"
Date:

On Apr 5, 2007, at 4:09 AM, Ron wrote:

> BE VERY WARY OF USING AN ADAPTEC RAID CONTROLLER!

Thanks - I received similar private emails with the same advice. I
will change the controller to a LSI MegaRAID SAS 8408E -- any
feedback on this one?

>
> IME, they are usually the worst of the commodity RAID controllers
> available.
> I've often seen SW RAID outperform them.
>
> If you are going to use this config, Tyan's n3600M (AKA S2932) MB
> has a variant that comes with 8 SAS + 6 SATA II connectors.
> The SKU is S2932WG2NR.
> http://www.tyan.us/product_board_detail.aspx?pid=453

I plan on leveraging the battery backed module so onboard sas isn't a
priority for me.

> I'm using this beast (Dual Socket F, AMD Barcelona ready, 16 DIMMS
> supporting up to 64GB of ECC RAM, 2 PCI-Ex16 slots w/ PCI-Ex8
> signalling, etc:
> ~$450 US w/o SAS, ~$500 US w/ SAS) for my most recent pg 8.2.3
> build on top of XFS.

I'm curious to know why you're on xfs (i've been too chicken to stray
from ext3).

> If the on board RAID is or becomes inadequate to your needs, I'd
> strongly suggest either 3ware or Areca RAID controllers.

I don't know why, but my last attempt at using an areca 1120 w/ linux
on amd64 (and > 4gb ram) was disastrous - i will never use them
again. 3ware's been rock solid for us.

>
> Side Note:
> What kind of HDs are the 8*73GB ones?  If they are the new 2.5"
> Savvio 15Ks, be =VERY= careful about having proper power and
> cooling for them.
> 14 HD's in one case are going to have a serious transient load on
> system start up and (especially with those SAS HDs) can generate a
> great deal of heat.

I went w/ Fujitsu. Fortunately these servers are hosted in a very
well ventilated area so i am not that concerned with heat issues.

>
> What 16bay 3U server are you using?

supermicro sc836tq-r800
http://www.supermicro.com/products/chassis/3U/836/SC836TQ-R800V.cfm

Thanks for all the help!


From:
"Joshua D. Drake"
Date:

 wrote:
>
> On Apr 5, 2007, at 4:09 AM, Ron wrote:
>
>> BE VERY WARY OF USING AN ADAPTEC RAID CONTROLLER!
>
> Thanks - I received similar private emails with the same advice. I will
> change the controller to a LSI MegaRAID SAS 8408E -- any feedback on
> this one?

LSI makes a good controller and the driver for linux is very stable.

Joshua D. Drake

--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
"Alex Deucher"
Date:

On 4/5/07,  <> wrote:
>
> On Apr 5, 2007, at 4:09 AM, Ron wrote:
>
> > BE VERY WARY OF USING AN ADAPTEC RAID CONTROLLER!
>
> Thanks - I received similar private emails with the same advice. I
> will change the controller to a LSI MegaRAID SAS 8408E -- any
> feedback on this one?

We use the LSI SAS1064 SAS chips and they've been great.

>
> >
> > IME, they are usually the worst of the commodity RAID controllers
> > available.
> > I've often seen SW RAID outperform them.
> >
> > If you are going to use this config, Tyan's n3600M (AKA S2932) MB
> > has a variant that comes with 8 SAS + 6 SATA II connectors.
> > The SKU is S2932WG2NR.
> > http://www.tyan.us/product_board_detail.aspx?pid=453
>
> I plan on leveraging the battery backed module so onboard sas isn't a
> priority for me.
>
> > I'm using this beast (Dual Socket F, AMD Barcelona ready, 16 DIMMS
> > supporting up to 64GB of ECC RAM, 2 PCI-Ex16 slots w/ PCI-Ex8
> > signalling, etc:
> > ~$450 US w/o SAS, ~$500 US w/ SAS) for my most recent pg 8.2.3
> > build on top of XFS.
>
> I'm curious to know why you're on xfs (i've been too chicken to stray
> from ext3).

I've had great performance with jfs, however there are some issues
with it on certain bigendian platforms.

>
> > If the on board RAID is or becomes inadequate to your needs, I'd
> > strongly suggest either 3ware or Areca RAID controllers.
>
> I don't know why, but my last attempt at using an areca 1120 w/ linux
> on amd64 (and > 4gb ram) was disastrous - i will never use them
> again. 3ware's been rock solid for us.
>
> >
> > Side Note:
> > What kind of HDs are the 8*73GB ones?  If they are the new 2.5"
> > Savvio 15Ks, be =VERY= careful about having proper power and
> > cooling for them.
> > 14 HD's in one case are going to have a serious transient load on
> > system start up and (especially with those SAS HDs) can generate a
> > great deal of heat.
>
> I went w/ Fujitsu. Fortunately these servers are hosted in a very
> well ventilated area so i am not that concerned with heat issues.
>

We have the 2.5" drives (seagates and fujitsus) and they have been
reliable and performed well.

Alex

From:
Ron
Date:

At 11:19 AM 4/5/2007, Scott Marlowe wrote:
>On Thu, 2007-04-05 at 00:32, Tom Lane wrote:
> > "James Mansion" <> writes:
> > >> Right --- the point is not the interface, but whether the drive is built
> > >> for reliability or to hit a low price point.
> >
> > > Personally I take the marketing mublings about the enterprise drives
> > > with a pinch of salt.  The low-price drives HAVE TO be reliable too,
> > > because a non-negligible failure rate will result in returns processing
> > > costs that destroy a very thin margin.
> >
> > Reliability is relative.  Server-grade drives are built to be beat upon
> > 24x7x365 for the length of their warranty period.  Consumer-grade drives
> > are built to be beat upon a few hours a day, a few days a week, for the
> > length of their warranty period.  Even if the warranties mention the
> > same number of years, there is a huge difference here.
>
>Just a couple of points...
>
>Server drives are generally more tolerant of higher temperatures.  I.e.
>the failure rate for consumer and server class HDs may be about the same
>at 40 degrees C, but by the time the internal case temps get up to 60-70
>degrees C, the consumer grade drives will likely be failing at a much
>higher rate, whether they're working hard or not.

Exactly correct.


>Which brings up my next point:
>
>I'd rather have 36 consumer grade drives in a case that moves a LOT of
>air and keeps the drive bays cool than 12 server class drives in a case
>that has mediocre / poor air flow in it.

Also exactly correct.  High temperatures or unclean power issues age
HDs faster than any other factors.

This is why I dislike 1U's for the vast majority f applications.


>I would, however, allocate 3 or 4 drives as spares in the 36 drive
>array just to be sure.
10% sparing is reasonable.


>Last point:
>
>As has been mentioned in this thread already, not all server drives
>are created equal.  Anyone who lived through the HP Surestore 2000
>debacle or one like it can attest to that.

Yeah, that was very much !no! fun.


>  Until the drives have been burnt in and proven reliable, just
> assume that they could all fail at any time and act accordingly.
Yep.  Folks should google "bath tub curve of statistical failure" or
similar.   Basically, always burn in your drives for at least 1/2 a
day before using them in a production or mission critical role.


Cheers,
Ron Peacetree


From:
Arjen van der Meijden
Date:

On 5-4-2007 17:58  wrote:
>
> On Apr 5, 2007, at 4:09 AM, Ron wrote:
>
>> BE VERY WARY OF USING AN ADAPTEC RAID CONTROLLER!
>
> Thanks - I received similar private emails with the same advice. I will
> change the controller to a LSI MegaRAID SAS 8408E -- any feedback on
> this one?

We have the dell-equivalent (PERC 5/e and PERC 5/i) in production and
have had no issues with it, it also performes very well (compared to a
ICP Vortex controller). The LSI has been benchmarked by my colleague and
he was pleased with the controller.

> I went w/ Fujitsu. Fortunately these servers are hosted in a very well
> ventilated area so i am not that concerned with heat issues.

We have 15 of the 36GB drives and they are doing great. According to
that same colleague, the Fujitsu drives are currently the best
performing drives. Although he hasn't had his hands on the new Savvio
15k rpm drives yet.

>> What 16bay 3U server are you using?
>
> supermicro sc836tq-r800
> http://www.supermicro.com/products/chassis/3U/836/SC836TQ-R800V.cfm

You could also look at this version of that chassis:
http://www.supermicro.com/products/chassis/3U/836/SC836E1-R800V.cfm

Afaik it sports a 28-port expander, which should (please confirm with
your vendor) allow you to connect all 16 drives to the 8-ports of your
controller. Which in turn allows your both sets of disks to be used with
your BBU-backed controller.

Best regards,

Arjen

From:
"James Mansion"
Date:

>Server drives are generally more tolerant of higher temperatures.  I.e.
>the failure rate for consumer and server class HDs may be about the same
>at 40 degrees C, but by the time the internal case temps get up to 60-70
>degrees C, the consumer grade drives will likely be failing at a much
>higher rate, whether they're working hard or not.

Can you cite any statistical evidence for this?

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date: 04/04/2007
13:09


From:
david@lang.hm
Date:

On Thu, 5 Apr 2007,  wrote:

>
> I'm curious to know why you're on xfs (i've been too chicken to stray from
> ext3).

better support for large files (although postgres does tend to try and
keep the file size down by going with multiple files) and also for more
files

the multiple levels of indirection that ext3 uses for accessing large
files (or large directories) can really slow things down, just from the
overhead of looking up the metadata (including finding where the actual
data blocks are on disk)

ext4 is planning to address this and will probably be a _very_ good
improvement, but ext3 has very definiate limits that it inherited from
ext2.

David Lang


From:
david@lang.hm
Date:

On Thu, 5 Apr 2007, Ron wrote:

> At 11:19 AM 4/5/2007, Scott Marlowe wrote:
>> On Thu, 2007-04-05 at 00:32, Tom Lane wrote:
>> >  "James Mansion" <> writes:
>> > > >  Right --- the point is not the interface, but whether the drive is
>> > > >  built
>> > > >  for reliability or to hit a low price point.
>> >
>> > >  Personally I take the marketing mublings about the enterprise drives
>> > >  with a pinch of salt.  The low-price drives HAVE TO be reliable too,
>> > >  because a non-negligible failure rate will result in returns
>> > >  processing
>> > >  costs that destroy a very thin margin.
>> >
>> >  Reliability is relative.  Server-grade drives are built to be beat upon
>> >  24x7x365 for the length of their warranty period.  Consumer-grade drives
>> >  are built to be beat upon a few hours a day, a few days a week, for the
>> >  length of their warranty period.  Even if the warranties mention the
>> >  same number of years, there is a huge difference here.
>>
>> Just a couple of points...
>>
>> Server drives are generally more tolerant of higher temperatures.  I.e.
>> the failure rate for consumer and server class HDs may be about the same
>> at 40 degrees C, but by the time the internal case temps get up to 60-70
>> degrees C, the consumer grade drives will likely be failing at a much
>> higher rate, whether they're working hard or not.
>
> Exactly correct.
>
>
>> Which brings up my next point:
>>
>> I'd rather have 36 consumer grade drives in a case that moves a LOT of
>> air and keeps the drive bays cool than 12 server class drives in a case
>> that has mediocre / poor air flow in it.
>
> Also exactly correct.  High temperatures or unclean power issues age HDs
> faster than any other factors.
>

this I agree with, however I believe that this is _so_ much of a factor
that it swamps any difference that they may be between 'enterprise' and
'consumer' drives.

>
>>  Until the drives have been burnt in and proven reliable, just assume that
>>  they could all fail at any time and act accordingly.
> Yep.  Folks should google "bath tub curve of statistical failure" or similar.
> Basically, always burn in your drives for at least 1/2 a day before using
> them in a production or mission critical role.

for this and your first point, please go and look at the google and cmu
studies. unless the vendors did the burn-in before delivering the drives
to the sites that installed them, there was no 'infant mortality' spike on
the drives (both studies commented on this, they expected to find one)

David Lang

From:
Scott Marlowe
Date:

On Thu, 2007-04-05 at 14:30, James Mansion wrote:
> >Server drives are generally more tolerant of higher temperatures.  I.e.
> >the failure rate for consumer and server class HDs may be about the same
> >at 40 degrees C, but by the time the internal case temps get up to 60-70
> >degrees C, the consumer grade drives will likely be failing at a much
> >higher rate, whether they're working hard or not.
>
> Can you cite any statistical evidence for this?

Logic?

Mechanical devices have decreasing MTBF when run in hotter environments,
often at non-linear rates.

Server class drives are designed with a longer lifespan in mind.

Server class hard drives are rated at higher temperatures than desktop
drives.

Google can supply any numbers to fill those facts in, but I found a
dozen or so data sheets for various enterprise versus desktop drives in
a matter of minutes.

From:
david@lang.hm
Date:

On Thu, 5 Apr 2007, Scott Marlowe wrote:

> On Thu, 2007-04-05 at 14:30, James Mansion wrote:
>>> Server drives are generally more tolerant of higher temperatures.  I.e.
>>> the failure rate for consumer and server class HDs may be about the same
>>> at 40 degrees C, but by the time the internal case temps get up to 60-70
>>> degrees C, the consumer grade drives will likely be failing at a much
>>> higher rate, whether they're working hard or not.
>>
>> Can you cite any statistical evidence for this?
>
> Logic?
>
> Mechanical devices have decreasing MTBF when run in hotter environments,
> often at non-linear rates.

this I will agree with.

> Server class drives are designed with a longer lifespan in mind.
>
> Server class hard drives are rated at higher temperatures than desktop
> drives.

these two I question.

David Lang

From:
Ron
Date:

At 10:07 PM 4/5/2007,  wrote:
>On Thu, 5 Apr 2007, Scott Marlowe wrote:
>
>>Server class drives are designed with a longer lifespan in mind.
>>
>>Server class hard drives are rated at higher temperatures than desktop
>>drives.
>
>these two I question.
>
>David Lang
Both statements are the literal truth.  Not that I would suggest
abusing your server class HDs just because they are designed to live
longer and in more demanding environments.

Overheating, nasty electrical phenomenon, and abusive physical shocks
will trash a server class HD almost as fast as it will a consumer grade one.

The big difference between the two is that a server class HD can sit
in a rack with literally 100's of its brothers around it, cranking
away on server class workloads 24x7 in a constant vibration
environment (fans, other HDs, NOC cooling systems) and be quite happy
while a consumer HD will suffer greatly shortened life and die a
horrible death in such a environment and under such use.


Ron


From:
david@lang.hm
Date:

On Thu, 5 Apr 2007, Ron wrote:

> At 10:07 PM 4/5/2007,  wrote:
>> On Thu, 5 Apr 2007, Scott Marlowe wrote:
>>
>> > Server class drives are designed with a longer lifespan in mind.
>> >
>> > Server class hard drives are rated at higher temperatures than desktop
>> > drives.
>>
>> these two I question.
>>
>> David Lang
> Both statements are the literal truth.  Not that I would suggest abusing your
> server class HDs just because they are designed to live longer and in more
> demanding environments.
>
> Overheating, nasty electrical phenomenon, and abusive physical shocks will
> trash a server class HD almost as fast as it will a consumer grade one.
>
> The big difference between the two is that a server class HD can sit in a
> rack with literally 100's of its brothers around it, cranking away on server
> class workloads 24x7 in a constant vibration environment (fans, other HDs,
> NOC cooling systems) and be quite happy while a consumer HD will suffer
> greatly shortened life and die a horrible death in such a environment and
> under such use.

Ron,
   I know that the drive manufacturers have been claiming this, but I'll
say that my experiance doesn't show a difference and neither do the google
and CMU studies (and they were all in large datacenters, some HPC labs,
some commercial companies).

again the studies showed _no_ noticable difference between the
'enterprise' SCSI drives and the 'consumer' SATA drives.

David Lang

From:
Richard Troy
Date:

On Thu, 5 Apr 2007  wrote:
> On Thu, 5 Apr 2007, Ron wrote:
> > At 10:07 PM 4/5/2007,  wrote:
> >> On Thu, 5 Apr 2007, Scott Marlowe wrote:
> >>
> >> > Server class drives are designed with a longer lifespan in mind.
> >> >
> >> > Server class hard drives are rated at higher temperatures than desktop
> >> > drives.
> >>
> >> these two I question.
> >>
> >> David Lang
> > Both statements are the literal truth.  Not that I would suggest abusing your
> > server class HDs just because they are designed to live longer and in more
> > demanding environments.
> >
> > Overheating, nasty electrical phenomenon, and abusive physical shocks will
> > trash a server class HD almost as fast as it will a consumer grade one.
> >
> > The big difference between the two is that a server class HD can sit in a
> > rack with literally 100's of its brothers around it, cranking away on server
> > class workloads 24x7 in a constant vibration environment (fans, other HDs,
> > NOC cooling systems) and be quite happy while a consumer HD will suffer
> > greatly shortened life and die a horrible death in such a environment and
> > under such use.
>
> Ron,
>    I know that the drive manufacturers have been claiming this, but I'll
> say that my experiance doesn't show a difference and neither do the google
> and CMU studies (and they were all in large datacenters, some HPC labs,
> some commercial companies).
>
> again the studies showed _no_ noticable difference between the
> 'enterprise' SCSI drives and the 'consumer' SATA drives.
>
> David Lang

Hi David, Ron,

I was just about to chime in to Ron's post when you did already, David. My
experience supports David's view point. I'm a scientist and with that hat
on my head I must acknowledge that it wasn't my goal to do a study on the
subject so my data is more of the character of anecdote. However, I work
with some pretty large shops, such as UC's SDSC, NOAA's NCDC (probably the
world's largest non-classified data center), Langley, among many others,
so my perceptions include insights from a lot of pretty sharp folks.

...When you provide your disk drives with clean power, cool, dry air, and
avoid serious shocks, it seems to be everyone's perception that all modern
drives - say, of the last ten years or a bit more - are exceptionally
reliable, and it's not at all rare to get 7 years and more out of a drive.
What seems _most_ detremental is power-cycles, without regard to which
type of drive you might have. This isn't to say the two types, "server
class" and "PC", are equal. PC drives are by comparison rather slow, and
that's their biggest downside, but they are also typically rather large.

Again, anecdotal evidence says that PC disks are typically cycled far more
often and so they also fail more often. Put them in the same environ as a
server-class disk and they'll also live a long time. Science Tools set up
our data center ten years ago this May, something more than a terrabyte -
large at the time (and it's several times that now), and we also adopted a
good handful of older equipment at that time, some twelve and fifteen
years old by now. We didn't have a single disk failure in our first seven
years, but then, we also never turn anything off unless it's being
serviced. Our disk drives are decidedly mixed - SCSI, all forms of ATA
and, some SATA in the last couple of years, and plenty of both server and
PC class. Yes, the older ones are dieing now - we lost one on a server
just now (so recently we haven't yet replaced it), but the death rate is
still remarkably low.

I should point out that we've had far more controller failures than drive
failures, and these have come all through these ten years at seemingly
random times. Unfortunately, I can't really comment on which brands are
better or worse, but I do remember once when we had a 50% failure rate of
some new SATA cards a few years back. Perhaps it's also worth a keystroke
or two to comment that we rotate new drives in on an annual basis, and the
older ones get moved to less critical, less stressful duties. Generally,
our oldest drives are now serving our gateway / firewall systems (of which
we have several), while our newest are providing primary daily workhorse
service, and middle-aged are serving hot-backup duty. Perhaps you could
argue that this putting out to pasture isn't comparable to heavy 24/7/356
demands, but then, that wouldn't be appropriate for a fifteen year old
drive, now would it? -smile-

Good luck with your drives,
Richard

--
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
, http://ScienceTools.com/


From:
Greg Smith
Date:

On Thu, 5 Apr 2007, Scott Marlowe wrote:

> On Thu, 2007-04-05 at 14:30, James Mansion wrote:
>> Can you cite any statistical evidence for this?
> Logic?

OK, everyone who hasn't already needs to read the Google and CMU papers.
I'll even provide links for you:

http://www.cs.cmu.edu/~bianca/fast07.pdf
http://labs.google.com/papers/disk_failures.pdf

There are several things their data suggests that are completely at odds
with the lore suggested by traditional logic-based thinking in this area.
Section 3.4 of Google's paper basically disproves that "mechanical devices
have decreasing MTBF when run in hotter environments" applies to hard
drives in the normal range they're operated in.  Your comments about
server hard drives being rated to higher temperatures is helpful, but
conclusions drawn from just thinking about something I don't trust when
they conflict with statistics to the contrary.

I don't want to believe everything they suggest, but enough of it matches
my experience that I find it difficult to dismiss the rest.  For example,
I scan all my drives for reallocated sectors, and the minute there's a
single one I get e-mailed about it and get all the data off that drive
pronto.  This has saved me from a complete failure that happened within
the next day on multiple occasions.

The main thing I wish they'd published is breaking some of the statistics
down by drive manufacturer.  For example, they suggest a significant
number of drive failures were not predicted by SMART.  I've seen plenty of
drives where the SMART reporting was spotty at best (yes, I'm talking
about you, Maxtor) and wouldn't be surprised that they were quiet right up
to their bitter (and frequent) end.  I'm not sure how that factor may have
skewed this particular bit of data.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Ron
Date:

At 11:40 PM 4/5/2007,  wrote:
>On Thu, 5 Apr 2007, Ron wrote:
>
>>At 10:07 PM 4/5/2007,  wrote:
>>>On Thu, 5 Apr 2007, Scott Marlowe wrote:
>>> > Server class drives are designed with a longer lifespan in mind.
>>> > > Server class hard drives are rated at higher temperatures than desktop
>>> > drives.
>>>these two I question.
>>>David Lang
>>Both statements are the literal truth.  Not that I would suggest
>>abusing your server class HDs just because they are designed to
>>live longer and in more demanding environments.
>>
>>Overheating, nasty electrical phenomenon, and abusive physical
>>shocks will trash a server class HD almost as fast as it will a
>>consumer grade one.
>>
>>The big difference between the two is that a server class HD can
>>sit in a rack with literally 100's of its brothers around it,
>>cranking away on server class workloads 24x7 in a constant
>>vibration environment (fans, other HDs, NOC cooling systems) and be
>>quite happy while a consumer HD will suffer greatly shortened life
>>and die a horrible death in such a environment and under such use.
>
>Ron,
>   I know that the drive manufacturers have been claiming this, but
> I'll say that my experiance doesn't show a difference and neither
> do the google and CMU studies (and they were all in large
> datacenters, some HPC labs, some commercial companies).
>
>again the studies showed _no_ noticable difference between the
>'enterprise' SCSI drives and the 'consumer' SATA drives.
>
>David Lang
Bear in mind that Google was and is notorious for pushing their
environmental factors to the limit while using the cheapest "PoS" HW
they can get their hands on.
Let's just say I'm fairly sure every piece of HW they were using for
those studies was operating outside of manufacturer's suggested specifications.

Under such conditions the environmental factors are so deleterious
that they swamp any other effect.

OTOH, I've spent my career being as careful as possible to as much as
possible run HW within manufacturer's suggested specifications.
I've been chided for it over the years... ...usually by folks who
"save" money by buying commodity HDs for big RAID farms in NOCs or
push their environmental envelope or push their usage envelope or ...
...and then act surprised when they have so much more down time and
HW replacements than I do.

All I can tell you is that I've gotten to eat my holiday dinner far
more often than than my counterparts who push it in that fashion.

OTOH, there are crises like the Power Outage of 2003 in the NE USA
where some places had such Bad Things happen that it simply doesn't
matter what you bought
(power dies, generator cuts in, power comes on, but AC units crash,
temperatures shoot up so fast that by the time everything is
re-shutdown it's in the 100F range in the NOC.  Lot's 'O Stuff dies
on the spot + spend next 6 months having HW failures at
+considerably+ higher rates than historical norms.  Ick..)

  IME, it really does make a difference =if you pay attention to the
difference in the first place=.
If you treat everything equally poorly, then you should not be
surprised when everything acts equally poorly.

But hey, YMMV.

Cheers,
Ron Peacetree


From:
david@lang.hm
Date:

On Fri, 6 Apr 2007, Ron wrote:

> Bear in mind that Google was and is notorious for pushing their environmental
> factors to the limit while using the cheapest "PoS" HW they can get their
> hands on.
> Let's just say I'm fairly sure every piece of HW they were using for those
> studies was operating outside of manufacturer's suggested specifications.

Ron, please go read both the studies. unless you want to say that every
orginization the CMU picked to study also abused their hardware as
well....

> Under such conditions the environmental factors are so deleterious that they
> swamp any other effect.
>
> OTOH, I've spent my career being as careful as possible to as much as
> possible run HW within manufacturer's suggested specifications.
> I've been chided for it over the years... ...usually by folks who "save"
> money by buying commodity HDs for big RAID farms in NOCs or push their
> environmental envelope or push their usage envelope or ... ...and then act
> surprised when they have so much more down time and HW replacements than I
> do.
>
> All I can tell you is that I've gotten to eat my holiday dinner far more
> often than than my counterparts who push it in that fashion.
>
> OTOH, there are crises like the Power Outage of 2003 in the NE USA where some
> places had such Bad Things happen that it simply doesn't matter what you
> bought
> (power dies, generator cuts in, power comes on, but AC units crash,
> temperatures shoot up so fast that by the time everything is re-shutdown it's
> in the 100F range in the NOC.  Lot's 'O Stuff dies on the spot + spend next 6
> months having HW failures at +considerably+ higher rates than historical
> norms.  Ick..)
>
> IME, it really does make a difference =if you pay attention to the
> difference in the first place=.
> If you treat everything equally poorly, then you should not be surprised when
> everything acts equally poorly.
>
> But hey, YMMV.
>
> Cheers,
> Ron Peacetree
>
>

From:
Tom Lane
Date:

 writes:
> On Thu, 5 Apr 2007, Ron wrote:
>> Yep.  Folks should google "bath tub curve of statistical failure" or similar.
>> Basically, always burn in your drives for at least 1/2 a day before using
>> them in a production or mission critical role.

> for this and your first point, please go and look at the google and cmu
> studies. unless the vendors did the burn-in before delivering the drives
> to the sites that installed them, there was no 'infant mortality' spike on
> the drives (both studies commented on this, they expected to find one)

It seems hard to believe that the vendors themselves wouldn't burn in
the drives for half a day, if that's all it takes to eliminate a large
fraction of infant mortality.  The savings in return processing and
customer goodwill would surely justify the electricity they'd use.

            regards, tom lane

From:
Greg Smith
Date:

On Fri, 6 Apr 2007, Tom Lane wrote:

> It seems hard to believe that the vendors themselves wouldn't burn in
> the drives for half a day, if that's all it takes to eliminate a large
> fraction of infant mortality.

I've read that much of the damage that causes hard drive infant mortality
is related to shipping.  The drive is fine when it leaves the factory,
gets shaken up and otherwise brutalized by environmental changes in
transit (it's a long trip from Singapore to here), and therefore is a bit
whacked by the time it is installed.  A quick post-installation burn-in
helps ferret out when this happens.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Tom Lane
Date:

Greg Smith <> writes:
> On Fri, 6 Apr 2007, Tom Lane wrote:
>> It seems hard to believe that the vendors themselves wouldn't burn in
>> the drives for half a day, if that's all it takes to eliminate a large
>> fraction of infant mortality.

> I've read that much of the damage that causes hard drive infant mortality
> is related to shipping.

Doh, of course.  Maybe I'd better go to bed now...

            regards, tom lane

From:
Michael Stone
Date:

On Thu, Apr 05, 2007 at 11:19:04PM -0400, Ron wrote:
>Both statements are the literal truth.

Repeating something over and over again doesn't make it truth. The OP
asked for statistical evidence (presumably real-world field evidence) to
support that assertion. Thus far, all the publicly available evidence
does not show a significant difference between SATA and SCSI reliability
in the field.

Mike Stone

From:
Michael Stone
Date:

On Fri, Apr 06, 2007 at 02:00:15AM -0400, Tom Lane wrote:
>It seems hard to believe that the vendors themselves wouldn't burn in
>the drives for half a day, if that's all it takes to eliminate a large
>fraction of infant mortality.  The savings in return processing and
>customer goodwill would surely justify the electricity they'd use.

Wouldn't help if the reason for the infant mortality is bad handling
between the factory and the rack. One thing that I did question in the
CMU study was the lack of infant mortality--I've definately observed it,
but it might just be that my UPS guy is clumsier than theirs.

Mike Stone

From:
Ron
Date:

I read them as soon as they were available.  Then I shrugged and
noted YMMV to myself.


1= Those studies are valid for =those= users under =those= users'
circumstances in =those= users' environments.
  How well do those circumstances and environments mimic anyone else's?
I don't know since the studies did not document said in enough detail
(and it would be nigh unto impossible to do so) for me to compare
mine to theirs.  I =do= know that neither Google's nor a university's
nor an ISP's nor a HPC supercomputing facility's NOC are particularly
similar to say a financial institution's or a health care organization's NOC.
...and they better not be.  Ditto the personnel's behavior working them.

You yourself have said the environmental factors make a big
difference.  I agree.  I submit that therefore differences in the
environmental factors are just as significant.


2= I'll bet all the money in your pockets vs all the money in my
pockets that people are going to leap at the chance to use these
studies as yet another excuse to pinch IT spending further.  In the
process they are consciously or unconsciously going to imitate some
or all of the environments that were used in those studies.
Which IMHO is exactly wrong for most mission critical functions in
most non-university organizations.

While we can't all pamper our HDs to the extent that Richard Troy's
organization can, frankly that is much closer to the way things
should be done for most organizations.  Ditto Greg Smith's =very= good habit:
"I scan all my drives for reallocated sectors, and the minute there's
a single one I get e-mailed about it and get all the data off that
drive pronto.  This has saved me from a complete failure that
happened within the next day on multiple occasions."
Amen.

I'll make the additional bet that no matter what they say neither
Google nor the CMU places had to deal with setting up and running
environments where the consequences of data loss or data corruption
are as serious as they are for most mission critical business
applications.  =Especially= DBMSs in such organizations.
If anyone tried to convince me to run a mission critical or
production DBMS in a business the way Google runs their HW, I'd be
applying the clue-by-four liberally in "boot to the head" fashion
until either they got just how wrong they were or they convinced me
they were too stupid to learn.
A which point they are never touching my machines.


3= From the CMU paper:
  "We also find evidence, based on records of disk replacements in
the field, that failure rate is not constant with age, and that,
rather than a significant infant mortality effect, we see a
significant early onset of wear-out degradation. That is, replacement
rates in our data grew constantly with age, an effect often assumed
not to set in until after a nominal lifetime of 5 years."
"In our data sets, the replacement rates of SATA disks are not worse
than the replacement rates of SCSI or FC disks.
=This may indicate that disk independent factors, such as operating
conditions, usage and environmental factors, affect replacement=."
(emphasis mine)

If you look at the organizations in these two studies, you will note
that one thing they all have in common is that they are organizations
that tend to push the environmental and usage envelopes.  Especially
with regards to anything involving spending money.  (Google is an
extreme even in that group).
What these studies say clearly to me is that it is possible to be
penny-wise and pound-foolish with regards to IT spending...  ...and
that these organizations have a tendency to be so.
Not a surprise to anyone who's worked in those environments I'm sure.
The last thing the IT industry needs is for everyone to copy these
organization's IT behavior!


4= Tom Lane is of course correct that vendors burn in their HDs
enough before selling them to get past most infant mortality.  Then
any time any HD is shipped between organizations, it is usually
burned in again to detect and possibly deal with issues caused by
shipping.  That's enough to see to it that the end operating
environment is not going to see a bath tub curve failure rate.
Then environmental, usage, and maintenance factors further distort
both the shape and size of the statistical failure curve.


5= The major conclusion of the CMU paper is !NOT! that we should buy
the cheapest HDs we can because HD quality doesn't make a difference.
The important conclusion is that a very large segment of the industry
operates their equipment significantly enough outside manufacturer's
specifications that we need a new error rate model for end use.  I agree.
Regardless of what Seagate et al can do in their QA labs, we need
reliability numbers that are actually valid ITRW of HD usage.

The other take-away is that organizational policy and procedure with
regards to HD maintenance and use in most organizations could use improving.
I strongly agree with that as well.


Cheers,
Ron Peacetree



At 01:53 AM 4/6/2007,  wrote:
>On Fri, 6 Apr 2007, Ron wrote:
>
>>Bear in mind that Google was and is notorious for pushing their
>>environmental factors to the limit while using the cheapest "PoS"
>>HW they can get their hands on.
>>Let's just say I'm fairly sure every piece of HW they were using
>>for those studies was operating outside of manufacturer's suggested
>>specifications.
>
>Ron, please go read both the studies. unless you want to say that
>every orginization the CMU picked to study also abused their
>hardware as well....
>
>>Under such conditions the environmental factors are so deleterious
>>that they swamp any other effect.
>>
>>OTOH, I've spent my career being as careful as possible to as much
>>as possible run HW within manufacturer's suggested specifications.
>>I've been chided for it over the years... ...usually by folks who
>>"save" money by buying commodity HDs for big RAID farms in NOCs or
>>push their environmental envelope or push their usage envelope or
>>... ...and then act surprised when they have so much more down time
>>and HW replacements than I do.
>>
>>All I can tell you is that I've gotten to eat my holiday dinner far
>>more often than than my counterparts who push it in that fashion.
>>
>>OTOH, there are crises like the Power Outage of 2003 in the NE USA
>>where some places had such Bad Things happen that it simply doesn't
>>matter what you bought
>>(power dies, generator cuts in, power comes on, but AC units crash,
>>temperatures shoot up so fast that by the time everything is
>>re-shutdown it's in the 100F range in the NOC.  Lot's 'O Stuff dies
>>on the spot + spend next 6 months having HW failures at
>>+considerably+ higher rates than historical norms.  Ick..)
>>
>>IME, it really does make a difference =if you pay attention to the
>>difference in the first place=.
>>If you treat everything equally poorly, then you should not be
>>surprised when everything acts equally poorly.
>>
>>But hey, YMMV.
>>
>>Cheers,
>>Ron Peacetree


From:
Geoffrey
Date:

Tom Lane wrote:
> Greg Smith <> writes:
>> On Fri, 6 Apr 2007, Tom Lane wrote:
>>> It seems hard to believe that the vendors themselves wouldn't burn in
>>> the drives for half a day, if that's all it takes to eliminate a large
>>> fraction of infant mortality.
>
>> I've read that much of the damage that causes hard drive infant mortality
>> is related to shipping.
>
> Doh, of course.  Maybe I'd better go to bed now...
>
>             regards, tom lane

You actually sleep?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

From:
Ron
Date:

At 07:38 AM 4/6/2007, Michael Stone wrote:
>On Thu, Apr 05, 2007 at 11:19:04PM -0400, Ron wrote:
>>Both statements are the literal truth.
>
>Repeating something over and over again doesn't make it truth. The
>OP asked for statistical evidence (presumably real-world field
>evidence) to support that assertion. Thus far, all the publicly
>available evidence does not show a significant difference between
>SATA and SCSI reliability in the field.
Not quite.  Each of our professional experiences is +also+
statistical evidence.  Even if it is a personally skewed sample.

For instance, Your experience suggests that infant mortality is more
real than the studies stated.  Does that invalidate your
experience?  Of course not.
Does that invalidate the studies?  Equally clearly not.

My experience supports the hypothesis that spending slightly more for
quality and treating HDs better is worth it.
Does that mean one of us is right and the other wrong?  Nope.  Just
that =in my experience= it does make a difference.

The OP asked for real world evidence.   We're providing it; and
across a wider range of use cases than the studies used.

Cheers,
Ron


From:
Geoffrey
Date:

Michael Stone wrote:
> On Fri, Apr 06, 2007 at 02:00:15AM -0400, Tom Lane wrote:
>> It seems hard to believe that the vendors themselves wouldn't burn in
>> the drives for half a day, if that's all it takes to eliminate a large
>> fraction of infant mortality.  The savings in return processing and
>> customer goodwill would surely justify the electricity they'd use.
>
> Wouldn't help if the reason for the infant mortality is bad handling
> between the factory and the rack. One thing that I did question in the
> CMU study was the lack of infant mortality--I've definately observed it,
> but it might just be that my UPS guy is clumsier than theirs.

Good point.  Folks must realize that carriers handle computer hardware
the same way they handle a box of marshmallows or ball bearings..  A box
is a box is a box.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

From:
Michael Stone
Date:

On Fri, Apr 06, 2007 at 08:49:08AM -0400, Ron wrote:
>Not quite.  Each of our professional experiences is +also+
>statistical evidence.  Even if it is a personally skewed sample.

I'm not sure that word means what you think it means. I think the one
you're looking for is "anecdotal".

>My experience supports the hypothesis that spending slightly more for
>quality and treating HDs better is worth it.
>Does that mean one of us is right and the other wrong?  Nope.  Just
>that =in my experience= it does make a difference.

Well, without real numbers to back it up, it doesn't mean much in the
face of studies that include real numbers. Humans are, in general,
exceptionally lousy at assessing probabilities. There's a very real
tendency to exaggerate evidence that supports our preconceptions and
discount evidence that contradicts them. Maybe you're immune to that.
Personally, I tend to simply assume that anecdotal evidence isn't very
useful. This is why having some large scale independent studies is
valuable. The manufacturer's studies are obviously biased, and it's good
to have some basis for decision making other than "logic" (the classic
"proof by 'it stands to reason'"), marketing, or "I paid more for it" ("so
it's better whether it's better or not").

Mike Stone

From:
Scott Marlowe
Date:

On Thu, 2007-04-05 at 23:37, Greg Smith wrote:
> On Thu, 5 Apr 2007, Scott Marlowe wrote:
>
> > On Thu, 2007-04-05 at 14:30, James Mansion wrote:
> >> Can you cite any statistical evidence for this?
> > Logic?
>
> OK, everyone who hasn't already needs to read the Google and CMU papers.
> I'll even provide links for you:
>
> http://www.cs.cmu.edu/~bianca/fast07.pdf
> http://labs.google.com/papers/disk_failures.pdf
>
> There are several things their data suggests that are completely at odds
> with the lore suggested by traditional logic-based thinking in this area.
> Section 3.4 of Google's paper basically disproves that "mechanical devices
> have decreasing MTBF when run in hotter environments" applies to hard
> drives in the normal range they're operated in.

On the google:

The google study ONLY looked at consumer grade drives.  It did not
compare them to server class drives.

This is only true when the temperature is fairly low.  Note that the
drive temperatures in the google study are <=55C.  If the drive temp is
below 55C, then the environment, by extension, must be lower than that
by some fair bit, likely 10-15C, since the drive is a heat source, and
the environment the heat sink.  So, the environment here is likely in
the 35C range.

Most server drives are rated for 55-60C environmental temperature
operation, which means the drive would be even hotter.

As for the CMU study:

It didn't expressly compare server to consumer grade hard drives.
Remember, there are server class SATA drives, and there were (once upon
a time) consumer class SCSI drives.  If they had separated out the
drives by server / consumer grade I think the study would have been more
interesting.  But we just don't know from that study.

Personal Experience:

In my last job we had three very large storage arrays (big black
refrigerator looking boxes, you know the kind.)  Each one had somewhere
in the range of 150 or so drives in it.  The first two we purchased were
based on 9Gig server class SCSI drives.  The third, and newer one, was
based on commodity IDE drives.  I'm not sure of the size, but I believe
they were somewhere around 20Gigs or so.  So, this was 5 or so years
ago, not recently.

We had a cooling failure in our hosting center, and the internal
temperature of the data center rose to about 110F to 120F (43C to 48C).
We ran at that temperature for about 12 hours, before we got a
refrigerator on a flatbed brought in (btw, I highly recommend Aggreko if
you need large scale portable air conditioners or generators) to cool
things down.

In the months that followed the drives in the IDE based storage array
failed by the dozens.  We eventually replaced ALL the drives in that
storage array because of the failure rate.  The SCSI based arrays had a
few extra drives fail than usual, but nothing too shocking.

Now, maybe now Seagate et. al. are making their consumer grade drives
from yesterday's server grade technology, but 5 or 6 years ago that was
not the case from what I saw.

> Your comments about
> server hard drives being rated to higher temperatures is helpful, but
> conclusions drawn from just thinking about something I don't trust when
> they conflict with statistics to the contrary.

Actually, as I looked up some more data on this, I found it interesting
that 5 to 10 years ago, consumer grade drives were rated for 35C
environments, while today consumer grade drives seem to be rated to 55C
or 60C.  Same as server drives were 5 to 10 years ago.  I do think that
server grade drive tech has been migrating into the consumer realm over
time.  I can imagine that today's high performance game / home systems
with their heat generating video cards and tendency towards RAID1 /
RAID0 drive setups are pushing the drive manufacturers to improve
reliability of consumer disk drives.

> The main thing I wish they'd published is breaking some of the statistics
> down by drive manufacturer.  For example, they suggest a significant
> number of drive failures were not predicted by SMART.  I've seen plenty of
> drives where the SMART reporting was spotty at best (yes, I'm talking
> about you, Maxtor) and wouldn't be surprised that they were quiet right up
> to their bitter (and frequent) end.  I'm not sure how that factor may have
> skewed this particular bit of data.

I too have pretty much given up on Maxtor drives and things like SMART
or sleep mode, or just plain working properly.

In recent months, we had an AC unit fail here at work, and we have two
drive manufacturers for our servers.  Manufacturer F and S.  The drives
from F failed at a much higher rate, and developed lots and lots of bad
sectors, the drives from manufacturer S, OTOH, have not had an increased
failure rate.  While both manufacturers claim that their drives can
survive in an environment of 55/60C, I'm pretty sure one of them was
lying.  We are silently replacing the failed drives with drives from
manufacturer S.

Based on experience I think that on average server drives are more
reliable than consumer grade drives, and can take more punishment.  But,
the variables of manufacturer, model, and the batch often make even more
difference than grade.

From:
Ron
Date:

At 09:23 AM 4/6/2007, Michael Stone wrote:
>On Fri, Apr 06, 2007 at 08:49:08AM -0400, Ron wrote:
>>Not quite.  Each of our professional
>>experiences is +also+ statistical
>>evidence.  Even if it is a personally skewed sample.
>
>I'm not sure that word means what you think it
>means. I think the one you're looking for is "anecdotal".
OK, let's kill this one as well.  Personal
experience as related by non professionals is
often based on casual observation and of  questionable quality or veracity.
It therefore is deservedly called "anecdotal".

Professionals giving evidence in their
professional capacity within their field of
expertise are under an obligation to tell the
truth, the whole truth, and nothing but the truth
to the best of their knowledge and
ability.  Whether you are in court and sworn in or not.
Even if it's "just" to a mailing list ;-)

 From dictionary.com
an·ec·dot·al:
1.pertaining to, resembling, or containing
anecdotes: an anecdotal history of jazz.
2.(of the treatment of subject matter in
representational art) pertaining to the
relationship of figures or to the arrangement of
elements in a scene so as to emphasize the story
content of a subject. Compare narrative (def. 6).
3.based on personal observation, case study
reports, or random investigations rather than
systematic scientific evaluation: anecdotal evidence.

+also an·ec·dot·ic (-d t' k) or an·ec·dot·i·cal
(- -k l) Of, characterized by, or full of anecdotes.
+Based on casual observations or indications
rather than rigorous or scientific analysis:
"There are anecdotal reports of children poisoned
by hot dogs roasted over a fire of the [oleander] stems" (C. Claiborne Ray).

While evidence given by professionals can't be as
rigorous as that of a double blind and controlled
study,  there darn well better be nothing casual
or ill-considered about it.  And it had better
!not! be anything "distorted or emphasized" just
for the sake of making the story better.
(Good Journalists deal with this one all the time.)

In short, professional advice and opinions are
supposed to be considerably more rigorous and
analytical than anything "anecdotal".  The alternative is "malpractice".


>>My experience supports the hypothesis that
>>spending slightly more for quality and treating HDs better is worth it.
>>Does that mean one of us is right and the other
>>wrong?  Nope.  Just that =in my experience= it does make a difference.
>
>Well, without real numbers to back it up, it
>doesn't mean much in the face of studies that
>include real numbers. Humans are, in general,
>exceptionally lousy at assessing probabilities.
>There's a very real tendency to exaggerate
>evidence that supports our preconceptions and
>discount evidence that contradicts them. Maybe you're immune to that.

Half agree.   Half disagree.

Part of the definition of "professional" vs
"amateur" is an obligation to think and act
outside our personal "stuff" when acting in our professional capacity.
Whether numbers are explicitly involved or not.

I'm certainly not immune to personal bias.   No
one is.  But I have a professional obligation of
the highest order to do everything I can to make
sure I never think or act based on personal bias
when operating in my professional capacity.  All professionals do.

Maybe you've found it harder to avoid personal
bias without sticking strictly to controlled
studies.  I respect that.  Unfortunately the RW
is too fast moving and too messy to wait for a
laboratory style study to be completed before we
are called on to make professional decisions on
most issues we face within our work
IME I have to serve my customers in a timely
fashion that for the most part prohibits me from
waiting for the perfect experiment's outcome.


>Personally, I tend to simply assume that
>anecdotal evidence isn't very useful.

Agreed.  OTOH, there's not supposed to be
anything casual, ill-considered, or low quality
about professionals giving professional opinions within their
fields of expertise.  Whether numbers are explicitly involved or not.


>This is why having some large scale independent
>studies is valuable. The manufacturer's studies
>are obviously biased, and it's good to have some
>basis for decision making other than "logic"
>(the classic "proof by 'it stands to reason'"),
>marketing, or "I paid more for it" ("so it's
>better whether it's better or not").
No argument here.  However, note that there is
often other bias present even in studies that strive to be objective.
I described the bias in the sample set of the CMU study in a previous post.


Cheers,
Ron Peacetree


From:
Michael Stone
Date:

On Fri, Apr 06, 2007 at 12:41:25PM -0400, Ron wrote:
>3.based on personal observation, case study
>reports, or random investigations rather than
>systematic scientific evaluation: anecdotal evidence.

Here you even quote the appropriate definition before ignoring it.

>In short, professional advice and opinions are
>supposed to be considerably more rigorous and
>analytical than anything "anecdotal".  The alternative is "malpractice".

In any profession where malpractice is applicable, the profession
opinion had better be backed up by research rather than anecdote. I'm
not aware of any profession held to a "malpractice" standard which is
based on personal observation and random investigation rather than
formal methods.

>studies.  I respect that.  Unfortunately the RW
>is too fast moving and too messy to wait for a
>laboratory style study to be completed before we
>are called on to make professional decisions on
>most issues we face within our work
>IME I have to serve my customers in a timely
>fashion that for the most part prohibits me from
>waiting for the perfect experiment's outcome.

Which is what distinguishes your field from a field such as engineering
or medicine, and which is why waving the term "malpractice" around is
just plain silly. And claiming to have to wait for perfection is a red
herring. Did you record the numbers of disks involved (failed &
nonfailed), the models, the environmental conditions, the poweron hours,
etc.? That's what would distinguish anecdote from systematic study.

>Agreed.  OTOH, there's not supposed to be
>anything casual, ill-considered, or low quality
>about professionals giving professional opinions within their
>fields of expertise.  Whether numbers are explicitly involved or not.

If I go to an engineer and ask him how to build a strong bridge and he
responds with something like "Well, I always use steel bridges. I've
driven by concrete bridges that were cracked and needed repairs, and I
would never use a concrete bridge for a professional purpose." he'd lose
his license.  You'd expect the engineer to use, you know, numbers and
stuff, not anecdotal observations of bridges. The professional opinion
has to do with how to apply the numbers, not fundamentals like 100 year
loads, material strength, etc.

What you're arguing is that your personal observations are a perfectly
good substitute for more rigorous study, and that's frankly ridiculous.
In an immature field personal observations may be the best data
available, but that's a weakness of the field rather than a desirable
state. 200 years ago doctors operated the same way--I'm glad they
abandoned that for a more rigorous approach. The interesting thing is,
there was quite a disruption as quite a few of the more established
doctors were really offended by the idea that their professional
opinions would be replaced by standards of care based on large scale
studies.

Mike Stone

From:
Ron
Date:

At 02:19 PM 4/6/2007, Michael Stone wrote:
>On Fri, Apr 06, 2007 at 12:41:25PM -0400, Ron wrote:
>>3.based on personal observation, case study reports, or random
>>investigations rather than systematic scientific evaluation:
>>anecdotal evidence.
>
>Here you even quote the appropriate definition before ignoring it.
>>In short, professional advice and opinions are supposed to be
>>considerably more rigorous and analytical than anything
>>"anecdotal".  The alternative is "malpractice".
>
>In any profession where malpractice is applicable, the profession
>opinion had better be backed up by research rather than anecdote.
>I'm not aware of any profession held to a "malpractice" standard
>which is based on personal observation and random investigation
>rather than formal methods.
Talk to every Professional Engineer who's passed both rounds of the
Professional Engineering Exams.  While there's a significant
improvement in quality when comparing a formal study to professional
advice, there should be an equally large improvement when comparing
professional advice to random anecdotal evidence.

If there isn't, the professional isn't worth paying for.   ...and you
=can= be successfully sued for giving bad professional advice.


>>studies.  I respect that.  Unfortunately the RW is too fast moving
>>and too messy to wait for a laboratory style study to be completed
>>before we are called on to make professional decisions on most
>>issues we face within our work
>>IME I have to serve my customers in a timely fashion that for the
>>most part prohibits me from waiting for the perfect experiment's outcome.
>
>Which is what distinguishes your field from a field such as
>engineering or medicine, and which is why waving the term
>"malpractice" around is just plain silly.

Ok, since you know I am an engineer that crossed a professional line
in terms of insult.  That finishes this conversation.

...and you know very well that the use of the term "malpractice" was
not in the legal sense but in the strict dictionary sense: "mal,
meaning bad" "practice, meaning "professional practice."   ...and
unless you've been an academic your entire career you know the time
pressures of the RW of business.


>  And claiming to have to wait for perfection is a red herring. Did
> you record the numbers of disks involved (failed & nonfailed), the
> models, the environmental conditions, the power on hours, etc.?
> That's what would distinguish anecdote from systematic study.

Yes, as a matter of fact I =do= keep such maintenance records for
operations centers I've been responsible for.  Unfortunately, that is
not nearly enough to qualify for being "objective".  Especially since
it is not often possible to keep accurate track of every one might
want to.  Even your incomplete list.
Looks like you might not have ever =done= some of the studies you tout so much.


>>Agreed.  OTOH, there's not supposed to be anything casual,
>>ill-considered, or low quality about professionals giving
>>professional opinions within their
>>fields of expertise.  Whether numbers are explicitly involved or not.
>
>If I go to an engineer and ask him how to build a strong bridge and
>he responds with something like "Well, I always use steel bridges.
>I've driven by concrete bridges that were cracked and needed
>repairs, and I would never use a concrete bridge for a professional
>purpose." he'd lose his license.  You'd expect the engineer to use,
>you know, numbers and stuff, not anecdotal observations of bridges.
>The professional opinion has to do with how to apply the numbers,
>not fundamentals like 100 year loads, material strength, etc.
..and I referenced this as the knowledge base a professional uses to
render opinions and give advice.  That's far better than anecdote,
but far worse than specific study.  The history of bridge building is
in fact a perfect example for this phenomenon.  There are a number of
good books on this topic both specific to bridges and for other
engineering projects that failed due to mistakes in extrapolation.


>What you're arguing is that your personal observations are a
>perfectly good substitute for more rigorous study,

Of course I'm not!  and IMHO you know I'm not.  Insult number
two.  Go settle down.


>and that's frankly ridiculous.

Of course it would be.  The =point=, which you seem to just refuse to
consider, is that there is a valid degree of evidence between
"anecdote" and "data from proper objective study".  There has to be
for all sorts of reasons.

As I'm sure you know, the world is not binary.


>In an immature field personal observations may be the best data
>available, but that's a weakness of the field rather than a
>desirable state. 200 years ago doctors operated the same way--I'm
>glad they abandoned that for a more rigorous approach. The
>interesting thing is, there was quite a disruption as quite a few of
>the more established doctors were really offended by the idea that
>their professional opinions would be replaced by standards of care
>based on large scale studies.
..and this is just silly.   Personal observations of trained
observers are known and proven to be better than that of random observers.
It's also a hard skill to learn, let alone master.

=That's= one of the things we technical professionals are paid for:
being trained objective observers.

...and in the specific case of medicine there are known problems with
using large scale studies to base health care standards on.
The statistically normal human does not exist in the medical sense.
For instance, a given woman is actually very =unlikely= to have a
pregnancy exactly 9 months long.  Especially if her genetic family
history is biased towards bearing earlier or later than exactly 9 months.
Drug dosing is another good example, etc etc.

The problem with the doctors you mention is that they were
=supposedly= objective, but turned out not to be.
Similar example from Anthropology can be found on Stephen Jay Gould's
_The Mis-measure of Man_


Have a good day.
Ron Peacetree


From:
Greg Smith
Date:

On Fri, 6 Apr 2007, Scott Marlowe wrote:

> Most server drives are rated for 55-60C environmental temperature
> operation, which means the drive would be even hotter.

I chuckled when I dug into the details for the drives in my cheap PC; the
consumer drives from Seagate:
http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_7200_10.pdf

are rated to a higher operating temperature than their enterprise drives:
http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_es.pdf

They actually have an interesting white paper on this subject.  The factor
they talk about that isn't addressed in the studies we've been discussing
is the I/O workload of the drive:
http://www.seagate.com/content/pdf/whitepaper/TP555_BarracudaES_Jun06.pdf

What kind of sticks out when I compare all their data is that the chart in
the white paper puts the failure rate (AFR) of their consumer drives at
almost 0.6%, yet the specs on the consumer drive quote 0.34%.

Going back to the original question here, though, the rates are all
similar and small enough that I'd take many more drives over a small
number of slightly more reliable ones any day.  As long as you have a
controller that can support multiple hot-spares you should be way ahead.
I get more concerned about battery backup cache issues than this nowadays
(been through too many extended power outages in the last few years).

> I do think that server grade drive tech has been migrating into the
> consumer realm over time.  I can imagine that today's high performance
> game / home systems with their heat generating video cards and tendency
> towards RAID1 / RAID0 drive setups are pushing the drive manufacturers
> to improve reliability of consumer disk drives.

The introduction of fluid dynamic motor bearings into the hard drive
market over the last few years (ramping up around 2003) has very much
transformed the nature of that very temperature sensitive mechanism.
That's the cause of why a lot of rules of thumb from before that era don't
apply as strongly to modern drives.  Certainly that fact that today's
consumer processors produce massively more heat than those of even a few
years ago has contributed to drive manufacturers moving their specs
upwards as well.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Michael Stone
Date:

On Fri, Apr 06, 2007 at 03:37:08PM -0400, Ron wrote:
>>>studies.  I respect that.  Unfortunately the RW is too fast moving
>>>and too messy to wait for a laboratory style study to be completed
>>>before we are called on to make professional decisions on most
>>>issues we face within our work
>>>IME I have to serve my customers in a timely fashion that for the
>>>most part prohibits me from waiting for the perfect experiment's outcome.
>>
>>Which is what distinguishes your field from a field such as
>>engineering or medicine, and which is why waving the term
>>"malpractice" around is just plain silly.
>
>Ok, since you know I am an engineer that crossed a professional line
>in terms of insult.  That finishes this conversation.

Actually, I don't know what you are. I obviously should have been more
specific that the field I was refering to is computer systems
integration, which isn't a licensed engineering profession in any
jurisdiction that I'm aware of.

>...and you know very well that the use of the term "malpractice" was
>not in the legal sense but in the strict dictionary sense: "mal,
>meaning bad" "practice, meaning "professional practice."

That's the literal definition or etymology; the dictionary definition
will generally include terms like "negligence", "established rules",
etc., implying that there is an established, objective standard. I just
don't think that hard disk choice (or anything else about designing a
hardware & software system) can be argued to have an established
standard best practice. Heck, you probably can't even say "I did that
sucessfully last year, we can just implement the same solution" because
in this industry you probably couldn't buy the same parts (exagerrating
only somewhat).

>>  And claiming to have to wait for perfection is a red herring. Did
>>you record the numbers of disks involved (failed & nonfailed), the
>>models, the environmental conditions, the power on hours, etc.?
>>That's what would distinguish anecdote from systematic study.
>
>Yes, as a matter of fact I =do= keep such maintenance records for
>operations centers I've been responsible for.

Great! If you presented those numbers along with some context the data
could be assessed to form some kind of rational conclusion. But to
remind you of what you'd offered up to the time I suggested that you
were offering anecdotal evidence in response to a request for
statistical evidence:

>OTOH, I've spent my career being as careful as possible to as much as
>possible run HW within manufacturer's suggested specifications. I've
>been chided for it over the years... ...usually by folks who "save"
>money by buying commodity HDs for big RAID farms in NOCs or push their
>environmental envelope or push their usage envelope or ... ...and then
>act surprised when they have so much more down time and HW replacements
>than I do.
>
>All I can tell you is that I've gotten to eat my holiday dinner far more
>often than than my counterparts who push it in that fashion.

I don't know how to describe that other than as anecdotal. You seem to
be interpreting the term "anecdotal" as pejorative rather than
descriptive. It's not anecdotal because I question your ability or any
other such personal factor, it's anecdotal because if your answer to the
question is "in my professional opinion, A" and someone else says "in my
professional opinion, !A", we really haven't gotten any hard data to
synthesize a rational opinion.

Mike Stone

From:
david@lang.hm
Date:

On Fri, 6 Apr 2007, Scott Marlowe wrote:

> Based on experience I think that on average server drives are more
> reliable than consumer grade drives, and can take more punishment.

this I am not sure about

> But,
> the variables of manufacturer, model, and the batch often make even more
> difference than grade.

this I will agree with fully.

David Lang

From:
Charles Sprickman
Date:

On Fri, 6 Apr 2007,  wrote:

> On Fri, 6 Apr 2007, Scott Marlowe wrote:
>
>> Based on experience I think that on average server drives are more
>> reliable than consumer grade drives, and can take more punishment.
>
> this I am not sure about

I think they should survey Tivo owners next time.

Perfect stress-testing environment.  Mine runs at over 50C most of the
time, and it's writing 2 video streams 24/7.  What more could you do to
punish a drive? :)

Charles


>
> David Lang
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

From:
david@lang.hm
Date:

On Fri, 6 Apr 2007, Charles Sprickman wrote:

> On Fri, 6 Apr 2007,  wrote:
>
>>  On Fri, 6 Apr 2007, Scott Marlowe wrote:
>>
>> >  Based on experience I think that on average server drives are more
>> >  reliable than consumer grade drives, and can take more punishment.
>>
>>  this I am not sure about
>
> I think they should survey Tivo owners next time.
>
> Perfect stress-testing environment.  Mine runs at over 50C most of the time,
> and it's writing 2 video streams 24/7.  What more could you do to punish a
> drive? :)

and the drives that are in them are consumer IDE drives.

I will admit that I've removed to cover from my tivo to allow it to run
cooler, and I'm still on the origional drive + 100G drive I purchased way
back when (7+ years ago) before I removed the cover I did have times when
the tivo would die from the heat (Los Angeles area in the summer with no
A/C)

David Lang

From:
Andreas Kostyrka
Date:

* Charles Sprickman <> [070407 00:49]:
> On Fri, 6 Apr 2007,  wrote:
>
> >On Fri, 6 Apr 2007, Scott Marlowe wrote:
> >
> >>Based on experience I think that on average server drives are more
> >>reliable than consumer grade drives, and can take more punishment.
> >
> >this I am not sure about
>
> I think they should survey Tivo owners next time.
>
> Perfect stress-testing environment.  Mine runs at over 50C most of the time, and it's writing 2 video streams 24/7.
Whatmore could you do to punish a drive? :) 

Well, there is one thing, actually what my dreambox does ;)

-) read/write 2 streams at the same time. (which means quite a bit of
seeking under pressure)
-) and even worse, standby and sleep states. And powering up the drive
when needed.

Andreas

From:
Mark Kirkwood
Date:

Ron wrote:
> I read them as soon as they were available.  Then I shrugged and noted
> YMMV to myself.
>
>
> 1= Those studies are valid for =those= users under =those= users'
> circumstances in =those= users' environments.
>  How well do those circumstances and environments mimic anyone else's?

Exactly, understanding whether the studies are applicable to you is the
critical step - before acting on their conclusions! Thanks Ron, for the
thoughtful analysis on this topic!

Cheers

Mark

From:
Bruce Momjian
Date:

In summary, it seems one of these is true:

    1.  Drive manufacturers don't design server drives to be more
reliable than consumer drive

    2.  Drive manufacturers _do_ design server drives to be more
reliable than consumer drive, but the design doesn't yield significantly
better reliability.

    3. Server drives are significantly more reliable than consumer
drives.


---------------------------------------------------------------------------

Scott Marlowe wrote:
> On Thu, 2007-04-05 at 23:37, Greg Smith wrote:
> > On Thu, 5 Apr 2007, Scott Marlowe wrote:
> >
> > > On Thu, 2007-04-05 at 14:30, James Mansion wrote:
> > >> Can you cite any statistical evidence for this?
> > > Logic?
> >
> > OK, everyone who hasn't already needs to read the Google and CMU papers.
> > I'll even provide links for you:
> >
> > http://www.cs.cmu.edu/~bianca/fast07.pdf
> > http://labs.google.com/papers/disk_failures.pdf
> >
> > There are several things their data suggests that are completely at odds
> > with the lore suggested by traditional logic-based thinking in this area.
> > Section 3.4 of Google's paper basically disproves that "mechanical devices
> > have decreasing MTBF when run in hotter environments" applies to hard
> > drives in the normal range they're operated in.
>
> On the google:
>
> The google study ONLY looked at consumer grade drives.  It did not
> compare them to server class drives.
>
> This is only true when the temperature is fairly low.  Note that the
> drive temperatures in the google study are <=55C.  If the drive temp is
> below 55C, then the environment, by extension, must be lower than that
> by some fair bit, likely 10-15C, since the drive is a heat source, and
> the environment the heat sink.  So, the environment here is likely in
> the 35C range.
>
> Most server drives are rated for 55-60C environmental temperature
> operation, which means the drive would be even hotter.
>
> As for the CMU study:
>
> It didn't expressly compare server to consumer grade hard drives.
> Remember, there are server class SATA drives, and there were (once upon
> a time) consumer class SCSI drives.  If they had separated out the
> drives by server / consumer grade I think the study would have been more
> interesting.  But we just don't know from that study.
>
> Personal Experience:
>
> In my last job we had three very large storage arrays (big black
> refrigerator looking boxes, you know the kind.)  Each one had somewhere
> in the range of 150 or so drives in it.  The first two we purchased were
> based on 9Gig server class SCSI drives.  The third, and newer one, was
> based on commodity IDE drives.  I'm not sure of the size, but I believe
> they were somewhere around 20Gigs or so.  So, this was 5 or so years
> ago, not recently.
>
> We had a cooling failure in our hosting center, and the internal
> temperature of the data center rose to about 110F to 120F (43C to 48C).
> We ran at that temperature for about 12 hours, before we got a
> refrigerator on a flatbed brought in (btw, I highly recommend Aggreko if
> you need large scale portable air conditioners or generators) to cool
> things down.
>
> In the months that followed the drives in the IDE based storage array
> failed by the dozens.  We eventually replaced ALL the drives in that
> storage array because of the failure rate.  The SCSI based arrays had a
> few extra drives fail than usual, but nothing too shocking.
>
> Now, maybe now Seagate et. al. are making their consumer grade drives
> from yesterday's server grade technology, but 5 or 6 years ago that was
> not the case from what I saw.
>
> > Your comments about
> > server hard drives being rated to higher temperatures is helpful, but
> > conclusions drawn from just thinking about something I don't trust when
> > they conflict with statistics to the contrary.
>
> Actually, as I looked up some more data on this, I found it interesting
> that 5 to 10 years ago, consumer grade drives were rated for 35C
> environments, while today consumer grade drives seem to be rated to 55C
> or 60C.  Same as server drives were 5 to 10 years ago.  I do think that
> server grade drive tech has been migrating into the consumer realm over
> time.  I can imagine that today's high performance game / home systems
> with their heat generating video cards and tendency towards RAID1 /
> RAID0 drive setups are pushing the drive manufacturers to improve
> reliability of consumer disk drives.
>
> > The main thing I wish they'd published is breaking some of the statistics
> > down by drive manufacturer.  For example, they suggest a significant
> > number of drive failures were not predicted by SMART.  I've seen plenty of
> > drives where the SMART reporting was spotty at best (yes, I'm talking
> > about you, Maxtor) and wouldn't be surprised that they were quiet right up
> > to their bitter (and frequent) end.  I'm not sure how that factor may have
> > skewed this particular bit of data.
>
> I too have pretty much given up on Maxtor drives and things like SMART
> or sleep mode, or just plain working properly.
>
> In recent months, we had an AC unit fail here at work, and we have two
> drive manufacturers for our servers.  Manufacturer F and S.  The drives
> from F failed at a much higher rate, and developed lots and lots of bad
> sectors, the drives from manufacturer S, OTOH, have not had an increased
> failure rate.  While both manufacturers claim that their drives can
> survive in an environment of 55/60C, I'm pretty sure one of them was
> lying.  We are silently replacing the failed drives with drives from
> manufacturer S.
>
> Based on experience I think that on average server drives are more
> reliable than consumer grade drives, and can take more punishment.  But,
> the variables of manufacturer, model, and the batch often make even more
> difference than grade.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

From:
Ron
Date:

Given all the data I have personally + all that I have from NOC
personnel, Sys Admins, Network Engineers, Operations Managers, etc my
experience (I do systems architecture consulting that requires me to
interface with many of these on a regular basis) supports a variation
of hypothesis 2.  Let's call it 2a:

2a= Drive manufacturers _do_ design server drives to be more reliable
than consumer drives
This is easily provable by opening the clam shells of a Seagate
consumer HD and a Seagate enterprise HD of the same generation and
comparing them.
In addition to non-visible quality differences in the actual media
(which result in warranty differences), there are notable differences
in the design and materials of the clam shells.
HOWEVER, there are at least 2 complicating factors in actually being
able to obtain the increased benefits from the better design:

  *HDs are often used in environments and use cases so far outside
their manufacturer's suggested norms that the beating they take
overwhelms the initial quality difference.  For instance, dirty power
events or 100+F room temperatures will age HDs so fast that even if
the enterprise HDs survive better, it's only going to be a bit better
in the worst cases.

*The pace of innovation in this business is so brisk that HDs from 4
years ago, of all types, are of considerably less quality than those made now.
Someone mentioned FDB and the difference they made.  Very much
so.  If you compare HDs from 4 years ago to ones made 8 years ago you
get a similar quality difference.  Ditto 8 vs 12 years ago.  Etc.

The reality is that all modern HDs are so good that it's actually
quite rare for someone to suffer a data loss event.  The consequences
of such are so severe that the event stands out more than just the
statistics would imply.  For those using small numbers of HDs, HDs just work.

OTOH, for those of us doing work that involves DBMSs and relatively
large numbers of HDs per system, both the math and the RW conditions
of service require us to pay more attention to quality details.
Like many things, one can decide on one of multiple ways to "pay the piper".

a= The choice made by many, for instance in the studies mentioned, is
to minimize initial acquisition cost and operating overhead and
simply accept having to replace HDs more often.

b= For those in fields were this is not a reasonable option
(financial services, health care, etc), or for those literally using
100's of HD per system (where statistical failure rates are so likely
that TLC is required), policies and procedures like those mentioned
in this thread (paying close attention to environment and use
factors, sector remap detecting, rotating HDs into and out of roles
based on age, etc) are necessary.

Anyone who does some close variation of "b" directly above =will= see
the benefits of using better HDs.

At least in my supposedly unqualified anecdotal 25 years of
professional experience.
Ron Peacetree



At 10:35 PM 4/6/2007, Bruce Momjian wrote:

>In summary, it seems one of these is true:
>
>         1.  Drive manufacturers don't design server drives to be more
>reliable than consumer drive
>
>         2.  Drive manufacturers _do_ design server drives to be more
>reliable than consumer drive, but the design doesn't yield significantly
>better reliability.
>
>         3. Server drives are significantly more reliable than consumer
>drives.
>


From:
david@lang.hm
Date:

On Sat, 7 Apr 2007, Ron wrote:

> The reality is that all modern HDs are so good that it's actually quite rare
> for someone to suffer a data loss event.  The consequences of such are so
> severe that the event stands out more than just the statistics would imply.
> For those using small numbers of HDs, HDs just work.
>
> OTOH, for those of us doing work that involves DBMSs and relatively large
> numbers of HDs per system, both the math and the RW conditions of service
> require us to pay more attention to quality details.
> Like many things, one can decide on one of multiple ways to "pay the piper".
>
> a= The choice made by many, for instance in the studies mentioned, is to
> minimize initial acquisition cost and operating overhead and simply accept
> having to replace HDs more often.
>
> b= For those in fields were this is not a reasonable option (financial
> services, health care, etc), or for those literally using 100's of HD per
> system (where statistical failure rates are so likely that TLC is required),
> policies and procedures like those mentioned in this thread (paying close
> attention to environment and use factors, sector remap detecting, rotating
> HDs into and out of roles based on age, etc) are necessary.
>
> Anyone who does some close variation of "b" directly above =will= see the
> benefits of using better HDs.
>
> At least in my supposedly unqualified anecdotal 25 years of professional
> experience.

Ron, why is it that you assume that anyone who disagrees with you doesn't
work in an environment where they care about the datacenter environment,
and aren't in fields like financial services? and why do you think that we
are just trying to save a few pennies? (the costs do factor in, but it's
not a matter of pennies, it's a matter of tens of thousands of dollars)

I actually work in the financial services field, I do have a good
datacenter environment that's well cared for.

while I don't personally maintain machines with hundreds of drives each, I
do maintain hundreds of machines with a small number of drives in each,
and a handful of machines with a few dozens of drives. (the database
machines are maintained by others, I do see their failed drives however)

it's also true that my expericance is only over the last 10 years, so I've
only been working with a few generations of drives, but my experiance is
different from yours.

my experiance is that until the drives get to be 5+ years old the failure
rate seems to be about the same for the 'cheap' drives as for the 'good'
drives. I won't say that they are exactly the same, but they are close
enough that I don't believe that there is a significant difference.

in other words, these studies do seem to match my experiance.

this is why, when I recently had to create some large capacity arrays, I'm
only ending up with machines with a few dozen drives in them instead of
hundreds. I've got two machines with 6TB of disk, one with 8TB, one with
10TB, and one with 20TB. I'm building these sytems for ~$1K/TB for the
disk arrays. other departments sho shoose $bigname 'enterprise' disk
arrays are routinely paying 50x that price

I am very sure that they are not getting 50x the reliability, I'm sure
that they aren't getting 2x the reliability.

I believe that the biggest cause for data loss from people useing the
'cheap' drives is due to the fact that one 'cheap' drive holds the
capacity of 5 or so 'expensive' drives, and since people don't realize
this they don't realize that the time to rebuild the failed drive onto a
hot-spare is correspondingly longer.

in the thread 'Sunfire X4500 recommendations' we recently had a discussion
on this topic starting from a guy who was asking the best way to configure
the drives in his sun x4500 (48 drive) system for safety. in that
discussion I took some numbers from the cmu study and as a working figure
I said a 10% chance for a drive to fail in a year (the study said 5-7% in
most cases, but some third year drives were around 10%). combining this
with the time needed to write 750G useing ~10% of the systems capacity
results in a rebuild time of about 5 days. it turns out that there is
almost a 5% chance of a second drive failing in a 48 drive array in this
time. If I were to build a single array with 142G 'enterprise' drives
instead of with 750G 'cheap' drives the rebuild time would be only 1 day
instead of 5, but you would have ~250 drives instead of 48 and so your
chance of a problem would be the same (I acknoledge that it's unlikly to
use 250 drives in a single array, and yes that does help, however if you
had 5 arrays of 50 drives each you would still have a 1% chance of a
second failure)

when I look at these numbers, my reaction isn't that it's wrong to go with
the 'cheap' drives, my reaction is that single reducndancy isn't good
enough. depending on how valuble the data is, you need to either replicate
the data to another system, or go with dual-parity redundancy (or both)

while drives probably won't be this bad in real life (this is after all,
slightly worse then the studies show for their 3rd year drives, and
'enterprise' drives may be slightly better) , I have to assume that they
will be for my reliability planning.

also, if you read throught the cmu study, drive failures were only a small
percentage of system outages (16-25% depending on the site). you have to
make sure that you aren't so fixated on drive reliabilty that you fail to
account for other types of problems (down to and including the chance of
someone accidently powering down the rack that you are plugged into, be
it from hitting a power switch, to overloading a weak circuit breaker)

In looking at these problems overall I find that in most cases I need to
have redundant systems with the data replicated anyway (with logs sent
elsewhere), so I can get away with building failover pairs instead of
having each machine with redundant drives. I've found that I can
frequently get a pair of machines for less money then other departments
spend on buying a single 'enterprise' machine with the same specs
(although the prices are dropping enough on the top-tier manufacturers
that this is less true today then it was a couple of years ago), and I
find that the failure rate is about the same on a per-machine basis, so I
end up with a much better uptime record due to having the redundancy of
the second full system (never mind things like it being easier to do
upgrades as I can work on the inactive machine and then failover to work
on the other, now, inactive machine). while I could ask for the budget to
be doubled to provide the same redundancy with the top-tier manufacturers
I don't do so for several reasons, the top two being that these
manufacurers frequently won't configure a machine the way I want them to
(just try to get a box with writeable media built in, either a floppy of a
CDR/DVDR, they want you to use something external), and doing so also
exposes me to people second guessing me on where redundancy is needed
('that's only development, we don't need redundancy there', until a system
goes down for a day and the entire department is unable to work)

it's not that the people who disagree with you don't care about their
data, it's that they have different experiances then you do (experiances
that come close to matching the studies where they tracked hundereds of
thousands of drives of different types), and as a result believe that the
difference (if any) between the different types of drives isn't
significant in the overall failure rate (especially when you take the
difference of drive capacity into account)

David Lang

P.S. here is a chart from that thread showing the chances of loosing data
with different array configurations.

if you say that there is a 10% chance of a disk failing each year
(significnatly higher then the studies listed above, but close enough)
then this works out to ~0.001% chance of a drive failing per hour (a
reasonably round number to work with)

to write 750G at ~45MB/sec takes 5 hours of 100% system throughput, or ~50
hours at 10% of the system throughput (background rebuilding)

if we cut this in half to account for inefficiancies in retrieving data
from other disks to calculate pairity it can take 100 hours (just over
four days) to do a background rebuild, or about 0.1% chance for each disk
of loosing a seond disk. with 48 drives this is ~5% chance of loosing
everything with single-parity, however the odds of loosing two disks
during this time are .25% so double-parity is _well_ worth it.

chance of loosing data before hotspare is finished rebuilding (assumes one
hotspare per group, you may be able to share a hotspare between multiple
groups to get slightly higher capacity)

> RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose data:
> disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
>             2          24           48           n/a                n/a
>             3          16           48           n/a         (0.0001% with manual replacement of drive)
>             4          12           48            12         0.0009%
>             6           8           48            24         0.003%
>             8           6           48            30         0.006%
>            12           4           48            36         0.02%
>            16           3           48            39         0.03%
>            24           2           48            42         0.06%
>            48           1           48            45         0.25%

> RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same group to loose data:
> disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
>             2          24           48            n/a        (~0.1% with manual replacement of drive)
>             3          16           48            16         0.2%
>             4          12           48            24         0.3%
>             6           8           48            32         0.5%
>             8           6           48            36         0.8%
>            12           4           48            40         1.3%
>            16           3           48            42         1.7%
>            24           2           48            44         2.5%
>            48           1           48            46         5%

so if I've done the math correctly the odds of losing data with the
worst-case double-parity (one large array including hotspare) are about
the same as the best case single parity (mirror+ hotspare), but with
almost triple the capacity.



From:
Ron
Date:

At 05:42 PM 4/7/2007,  wrote:
>On Sat, 7 Apr 2007, Ron wrote:
>
>>The reality is that all modern HDs are so good that it's actually
>>quite rare for someone to suffer a data loss event.  The
>>consequences of such are so severe that the event stands out more
>>than just the statistics would imply. For those using small numbers
>>of HDs, HDs just work.
>>
>>OTOH, for those of us doing work that involves DBMSs and relatively
>>large numbers of HDs per system, both the math and the RW
>>conditions of service require us to pay more attention to quality details.
>>Like many things, one can decide on one of multiple ways to "pay the piper".
>>
>>a= The choice made by many, for instance in the studies mentioned,
>>is to minimize initial acquisition cost and operating overhead and
>>simply accept having to replace HDs more often.
>>
>>b= For those in fields were this is not a reasonable option
>>(financial services, health care, etc), or for those literally
>>using 100's of HD per system (where statistical failure rates are
>>so likely that TLC is required), policies and procedures like those
>>mentioned in this thread (paying close attention to environment and
>>use factors, sector remap detecting, rotating HDs into and out of
>>roles based on age, etc) are necessary.
>>
>>Anyone who does some close variation of "b" directly above =will=
>>see the benefits of using better HDs.
>>
>>At least in my supposedly unqualified anecdotal 25 years of
>>professional experience.
>
>Ron, why is it that you assume that anyone who disagrees with you
>doesn't work in an environment where they care about the datacenter
>environment, and aren't in fields like financial services? and why
>do you think that we are just trying to save a few pennies? (the
>costs do factor in, but it's not a matter of pennies, it's a matter
>of tens of thousands of dollars)
I don't assume that.  I didn't make any assumptions.  I (rightfully
IMHO) criticized everyone jumping on the "See, cheap =is= good!"
bandwagon that the Google and CMU studies seem to have ignited w/o
thinking critically about them.
I've never mentioned or discussed specific financial amounts, so
you're making an (erroneous) assumption when you think my concern is
over people "trying to save a few pennies".

In fact, "saving pennies" is at the =bottom= of my priority list for
the class of applications I've been discussing.  I'm all for
economical, but to paraphrase Einstein "Things should be as cheap as
possible; but no cheaper."

My biggest concern is that something I've seen over and over again in
my career will happen again:
People tend to jump at the _slightest_ excuse to believe a story that
will save them short term money and resist even _strong_ reasons to
pay up front for quality.  Even if paying more up front would lower
their lifetime TCO.

The Google and CMU studies are =not= based on data drawn from
businesses where the lesser consequences of an outage are losing
$10Ks or $100K per minute... ...and where the greater consequences
include the chance of loss of human life.
Nor are they based on businesses that must rely exclusively on highly
skilled and therefore expensive labor.

In the case of the CMU study, people are even extrapolating an
economic conclusion the original author did not even make or intend!
Is it any wonder I'm expressing concern regarding inappropriate
extrapolation of those studies?


>I actually work in the financial services field, I do have a good
>datacenter environment that's well cared for.
>
>while I don't personally maintain machines with hundreds of drives
>each, I do maintain hundreds of machines with a small number of
>drives in each, and a handful of machines with a few dozens of
>drives. (the database machines are maintained by others, I do see
>their failed drives however)
>
>it's also true that my expericance is only over the last 10 years,
>so I've only been working with a few generations of drives, but my
>experiance is different from yours.
>
>my experiance is that until the drives get to be 5+ years old the
>failure rate seems to be about the same for the 'cheap' drives as
>for the 'good' drives. I won't say that they are exactly the same,
>but they are close enough that I don't believe that there is a
>significant difference.
>
>in other words, these studies do seem to match my experiance.
Fine.  Let's pretend =You= get to build Citibank's or Humana's next
mission critical production DBMS using exclusively HDs with 1 year warranties.
(never would be allowed ITRW)

Even if you RAID 6 them, I'll bet you anything that a system with 32+
HDs on it is likely enough to spend a high enough percentage of its
time operating in degraded mode that you are likely to be looking for
a job as a consequence of such a decision.
...and if you actually suffer data loss or, worse, data corruption,
that's a Career Killing Move.
(and it should be given the likely consequences to the public of such a F* up).


>this is why, when I recently had to create some large capacity
>arrays, I'm only ending up with machines with a few dozen drives in
>them instead of hundreds. I've got two machines with 6TB of disk,
>one with 8TB, one with 10TB, and one with 20TB. I'm building these
>sytems for ~$1K/TB for the disk arrays. other departments sho shoose
>$bigname 'enterprise' disk arrays are routinely paying 50x that price
>
>I am very sure that they are not getting 50x the reliability, I'm
>sure that they aren't getting 2x the reliability.
...and I'm very sure they are being gouged mercilessly by vendors who
are padding their profit margins exorbitantly at the customer's expense.
HDs or memory from the likes of EMC, HP, IBM, or Sun has been
overpriced for decades.
Unfortunately, for every one of me who shop around for good vendors
there are 20+ corporate buyers who keep on letting themselves get gouged.
Gouging is not going stop until the gouge prices are unacceptable to
enough buyers.

Now if the issue of price difference is based on =I/O interface= (SAS
vs SATA vs FC vs SCSI), that's a different, and orthogonal, issue.
The simple fact is that optical interconnects are far more expensive
than anything else and that SCSI electronics cost significantly more
than anything except FC.
There's gouging here as well, but far more of the pricing is justified.



>I believe that the biggest cause for data loss from people useing
>the 'cheap' drives is due to the fact that one 'cheap' drive holds
>the capacity of 5 or so 'expensive' drives, and since people don't
>realize this they don't realize that the time to rebuild the failed
>drive onto a hot-spare is correspondingly longer.
Commodity HDs get 1 year warranties for the same reason enterprise
HDs get 5+ year warranties: the vendor's confidence that they are not
going to lose money honoring the warranty in question.

AFAIK, there is no correlation between capacity of HDs and failure
rates or warranties on them.


Your point regarding using 2 cheaper systems in parallel instead of 1
gold plated system is in fact an expression of a basic Axiom of
Systems Theory with regards to Single Points of Failure.  Once
components become cheap enough, it is almost always better to have
redundancy rather than all one's eggs in 1 heavily protected basket.


Frankly, the only thing that made me feel combative is when someone
claimed there's no difference between anecdotal evidence and a
professional opinion or advice.
That's just so utterly unrealistic as to defy belief.
No one would ever get anything done if every business decision had to
wait on properly designed and executed lab studies.

It's also insulting to everyone who puts in the time and effort to be
a professional within a field rather than a lay person.

Whether there's a name for it or not, there's definitely an important
distinction between each of anecdote, professional opinion, and study result.


Cheers,
Ron Peacetree






From:
david@lang.hm
Date:

On Sat, 7 Apr 2007, Ron wrote:

>> Ron, why is it that you assume that anyone who disagrees with you doesn't
>> work in an environment where they care about the datacenter environment,
>> and aren't in fields like financial services? and why do you think that we
>> are just trying to save a few pennies? (the costs do factor in, but it's
>> not a matter of pennies, it's a matter of tens of thousands of dollars)
> I don't assume that.  I didn't make any assumptions.  I (rightfully IMHO)
> criticized everyone jumping on the "See, cheap =is= good!" bandwagon that the
> Google and CMU studies seem to have ignited w/o thinking critically about
> them.

Ron, I think that many people aren't saying cheap==good, what we are
doing is arguing against the idea that expesnsive==good (and it's
coorelary cheap==bad)

> I've never mentioned or discussed specific financial amounts, so you're
> making an (erroneous) assumption when you think my concern is over people
> "trying to save a few pennies".
>
> In fact, "saving pennies" is at the =bottom= of my priority list for the
> class of applications I've been discussing.  I'm all for economical, but to
> paraphrase Einstein "Things should be as cheap as possible; but no cheaper."

this I fully agree with, I have no problem spending money if I believe
that there's a cooresponding benifit.

> My biggest concern is that something I've seen over and over again in my
> career will happen again:
> People tend to jump at the _slightest_ excuse to believe a story that will
> save them short term money and resist even _strong_ reasons to pay up front
> for quality.  Even if paying more up front would lower their lifetime TCO.

on the other hand, it's easy for people to blow $bigbucks with this
argument with no significant reduction in their maintinance costs.

> The Google and CMU studies are =not= based on data drawn from businesses
> where the lesser consequences of an outage are losing $10Ks or $100K per
> minute... ...and where the greater consequences include the chance of loss of
> human life.
> Nor are they based on businesses that must rely exclusively on highly skilled
> and therefore expensive labor.

hmm, I didn't see the CMU study document what businesses it used.

> In the case of the CMU study, people are even extrapolating an economic
> conclusion the original author did not even make or intend!
> Is it any wonder I'm expressing concern regarding inappropriate extrapolation
> of those studies?

I missed the posts where people were extrapolating economic conclusions,
what I saw was people stateing that 'you better buy the SCSI drives as
they are more reliable', and other people pointing out that recent studies
indicate that there's not a significant difference in drive reliability
between the two types of drives

>> I actually work in the financial services field, I do have a good
>> datacenter environment that's well cared for.
>>
>> while I don't personally maintain machines with hundreds of drives each, I
>> do maintain hundreds of machines with a small number of drives in each, and
>> a handful of machines with a few dozens of drives. (the database machines
>> are maintained by others, I do see their failed drives however)
>>
>> it's also true that my expericance is only over the last 10 years, so I've
>> only been working with a few generations of drives, but my experiance is
>> different from yours.
>>
>> my experiance is that until the drives get to be 5+ years old the failure
>> rate seems to be about the same for the 'cheap' drives as for the 'good'
>> drives. I won't say that they are exactly the same, but they are close
>> enough that I don't believe that there is a significant difference.
>>
>> in other words, these studies do seem to match my experiance.
> Fine.  Let's pretend =You= get to build Citibank's or Humana's next mission
> critical production DBMS using exclusively HDs with 1 year warranties.
> (never would be allowed ITRW)

who is arguing that you should use drives with 1 year warranties? in case
you blinked consumer drive warranties are backup to 5 years.

> Even if you RAID 6 them, I'll bet you anything that a system with 32+ HDs on
> it is likely enough to spend a high enough percentage of its time operating
> in degraded mode that you are likely to be looking for a job as a consequence
> of such a decision.
> ...and if you actually suffer data loss or, worse, data corruption, that's a
> Career Killing Move.
> (and it should be given the likely consequences to the public of such a F*
> up).

so now it's "nobody got fired for buying SCSI?"

>> this is why, when I recently had to create some large capacity arrays, I'm
>> only ending up with machines with a few dozen drives in them instead of
>> hundreds. I've got two machines with 6TB of disk, one with 8TB, one with
>> 10TB, and one with 20TB. I'm building these sytems for ~$1K/TB for the disk
>> arrays. other departments sho shoose $bigname 'enterprise' disk arrays are
>> routinely paying 50x that price
>>
>> I am very sure that they are not getting 50x the reliability, I'm sure that
>> they aren't getting 2x the reliability.
> ...and I'm very sure they are being gouged mercilessly by vendors who are
> padding their profit margins exorbitantly at the customer's expense.
> HDs or memory from the likes of EMC, HP, IBM, or Sun has been overpriced for
> decades.
> Unfortunately, for every one of me who shop around for good vendors there are
> 20+ corporate buyers who keep on letting themselves get gouged.
> Gouging is not going stop until the gouge prices are unacceptable to enough
> buyers.

it's also not going to be stopped until people actually look at the
reliability of what they are getting, rather than assuming that becouse
it's labled 'enterprise' and costs more that it must be more reliable.

frankly, I think that a lot of the cost comes from the simple fact that
they use smaller SCSI drives (most of them haven't starting useing 300G
drives yet), and so they end up needing ~5x more drive bays, power,
cooling, cableing, ports on the controllers, etc. if you need 5x the
number of drives and they each cost 3x as much, you are already up to 15x
price multiplier, going from there to 50x is only adding another 3x
multiplier (which with the extra complexity of everything is easy to see,
and almost seems reasonable)

> Now if the issue of price difference is based on =I/O interface= (SAS vs SATA
> vs FC vs SCSI), that's a different, and orthogonal, issue.
> The simple fact is that optical interconnects are far more expensive than
> anything else and that SCSI electronics cost significantly more than anything
> except FC.
> There's gouging here as well, but far more of the pricing is justified.

going back to the post that started this thread. the OP was looking at two
equivalently priced systems, one with 8x73G SCSI and the other with
24x300G SATA. I don't buy the argument that the SCSI electronics are
_that_ expensive (especially since SATA and SAS are designed to be
compatable enough to plug togeather). yes the SCSI drives spin faster, and
that does contribute to the cost, but it still should't make one drive
cost 3x the other.

>> I believe that the biggest cause for data loss from people useing the
>> 'cheap' drives is due to the fact that one 'cheap' drive holds the capacity
>> of 5 or so 'expensive' drives, and since people don't realize this they
>> don't realize that the time to rebuild the failed drive onto a hot-spare is
>> correspondingly longer.
> Commodity HDs get 1 year warranties for the same reason enterprise HDs get 5+
> year warranties: the vendor's confidence that they are not going to lose
> money honoring the warranty in question.

at least seagate gives 5 year warranties on their consumer drives.

> AFAIK, there is no correlation between capacity of HDs and failure rates or
> warranties on them.

correct, but the larger drive will take longer to rebuild, so your window
of vunerability is larger.

> Your point regarding using 2 cheaper systems in parallel instead of 1 gold
> plated system is in fact an expression of a basic Axiom of Systems Theory
> with regards to Single Points of Failure.  Once components become cheap
> enough, it is almost always better to have redundancy rather than all one's
> eggs in 1 heavily protected basket.

also correct, the question is 'have hard drives reached this point'

the thought that there isn't a big difference in the reliability of the
drives doesn't mean that the enterprise drives are getting worse, it means
that the consumer drives are getting better, so much so that they they are
a valid option.

if I had the money to waste, I would love to see someone open the
'consumer grade' seagate Barracuda 7200.10 750G drive along with a
'enterprise grade' seagate Barracuda ES 750G drive (both of which have 5
year warranties) to see if there is still the same 'dramatic difference'
between consumer and enterprise drives that there used to be.

it would also be interesting to compare the high-end scsi drives with the
recent SATA/IDE drives. I'll have to look and see if I can catch some
dead drives before they get destroyed and open them up.

> Frankly, the only thing that made me feel combative is when someone claimed
> there's no difference between anecdotal evidence and a professional opinion
> or advice.
> That's just so utterly unrealistic as to defy belief.
> No one would ever get anything done if every business decision had to wait on
> properly designed and executed lab studies.

I think the assumption on lists like this is that anything anyone says is
a professional opinion, until proven otherwise. but a professional
opinion (no matter who it's from) isn't as good as a formal study

> It's also insulting to everyone who puts in the time and effort to be a
> professional within a field rather than a lay person.

it's also insulting to assume (or appear to assume) that everyone who
disagrees with your is a lay person. you may not have meant it (this is
e-mail after all, with all the problems that come from that), but this is
what you seem to have been implying, if not outright saying.

> Whether there's a name for it or not, there's definitely an important
> distinction between each of anecdote, professional opinion, and study result.

the line between an anecdote and a professional opinion is pretty blury,
and hard to see without wasting a lot of time getting everyone to give
their credentials, etc. if a professional doesn't spend enough time
thinking about some of the details (i.e. how many drive failures of each
type have I seen in the last 5 years as opposed to in the 5 year timeframe
from 1980-1985) they can end up giving an opinion that's in the range of
reliability and relavance that anecdotes are.

don't assume malice so quickly.

David Lang

From:
"Joshua D. Drake"
Date:

>>> I believe that the biggest cause for data loss from people useing the
>>> 'cheap' drives is due to the fact that one 'cheap' drive holds the
>>> capacity of 5 or so 'expensive' drives, and since people don't
>>> realize this they don't realize that the time to rebuild the failed
>>> drive onto a hot-spare is correspondingly longer.
>> Commodity HDs get 1 year warranties for the same reason enterprise HDs
>> get 5+ year warranties: the vendor's confidence that they are not
>> going to lose money honoring the warranty in question.
>
> at least seagate gives 5 year warranties on their consumer drives.

Hitachi 3 years
Maxtor  3 years
Samsung 1-3 years depending on drive (but who buys samsung drives)
Seagate 5 years (300 Gig, 7200 RPM perpendicular recording... 89 bucks)
Western Digital 3-5 years depending on drive

Joshua D. Drake





--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
mark@mark.mielke.cc
Date:

On Sat, Apr 07, 2007 at 08:46:33PM -0400, Ron wrote:
> The Google and CMU studies are =not= based on data drawn from
> businesses where the lesser consequences of an outage are losing
> $10Ks or $100K per minute... ...and where the greater consequences
> include the chance of loss of human life.
> Nor are they based on businesses that must rely exclusively on highly
> skilled and therefore expensive labor.

Google up time seems to be quite good. Reliability can be had from trusting
more reputable (and usually more expensive) manufacturers and product lines,
or it can be had through redundancy. The "I" in RAID.

I recall reading the Google study before, and believe I recall it
lacking in terms of how much it costs to pay the employees to maintain
the system.  It would be interesting to know whether the inexpensive
drives require more staff time to be spent on it. Staff time can
easily become more expensive than the drives themselves.

I believe there are factors that exist that are not easy to calculate.
Somebody else mentioned how Intel was not the cleanest architecture,
and yet, how Intel architecture makes up the world's fastest machines,
and the cheapest machines per work to complete. There is a game of
numbers being played. A manufacturer that sells 10+ million units has
the resources, the profit margin, and the motivation, to ensure that
their drives are better than a manufacturer that sells 100 thousand
units. Even if the manufacturer of the 100 K units spends double in
development per unit, they would only be spending 1/50 as much as the
manufacturer who makes 10+ million units.

As for your experience - no disrespect - but if your experience is over
the last 25 years, then you should agree that most of those years are
no longer relevant in terms of experience. SATA has only existed for
5 years or less, and is only now stabilizing in terms of having the
different layers of a solution supporting the features like command
queuing. The drives of today have already broken all sorts of rules that
people assumed were not possible to break 5 years ago, 10 years ago, or
20 years ago. The playing field is changing. Even if your experience is
correct or valid today - it may not be true tomorrow.

The drives of today, I consider to be incredible in terms of quality,
reliability, speed, and density. All of the major brands, for desktops
or servers, IDE, SATA, or SCSI, are amazing compared to only 10 years
ago. To say that they don't meet a standard - which standard?

Everything has a cost. Having a drive never break, will have a very
large cost. It will cost more to turn 99.9% to 99.99%. Given that the
products will never be perfect, perhaps it is valid to invest in a
low-cost fast-to-implement recovery solution, that will assumes that
some number of drives will fail in 6 months, 1 year, 2 years, and 5
years. Assume they will fail, because regardless of what you buy -
their is a good chance that they *will* fail. Paying double price for
hardware, with a hope that they will not fail, may not be a good
strategy.

I don't have a conclusion here - only things to consider.

Cheers,
mark

--
 /  /      __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


From:
Ron
Date:

At 11:13 PM 4/7/2007,  wrote:
>On Sat, 7 Apr 2007, Ron wrote:
>
>Ron, I think that many people aren't saying cheap==good, what we are
>doing is arguing against the idea that expesnsive==good (and it's
>coorelary cheap==bad)
Since the buying decision is binary, you either buy high quality HDs
or you don't, the distinction between the two statements makes no
difference ITRW and therefore is meaningless.  "The difference that
makes no difference =is= no difference."

The bottom line here is that no matter how it is "spun", people are
using the Google and the CMU studies to consider justifying reducing
the quality of the HDs they buy in order to reduce costs.

Frankly, they would be better advised to directly attack price
gouging by certain large vendors instead; but that is perceived as a
harder problem.  So instead they are considering what is essentially
an example of Programming by Side Effect.
Every SW professional on this list has been taught how bad a strategy
that usually is.


>>My biggest concern is that something I've seen over and over again
>>in my career will happen again:
>>People tend to jump at the _slightest_ excuse to believe a story
>>that will save them short term money and resist even _strong_
>>reasons to pay up front for quality.  Even if paying more up front
>>would lower their lifetime TCO.
>
>on the other hand, it's easy for people to blow $bigbucks with this
>argument with no significant reduction in their maintinance costs.
No argument there.  My comments on people putting up with price
gouging should make clear my position on overspending.


>>The Google and CMU studies are =not= based on data drawn from
>>businesses where the lesser consequences of an outage are losing
>>$10Ks or $100K per minute... ...and where the greater consequences
>>include the chance of loss of human life.
>>Nor are they based on businesses that must rely exclusively on
>>highly skilled and therefore expensive labor.
>
>hmm, I didn't see the CMU study document what businesses it used.
Section 2.3: Data Sources, p3-4.
3 HPC clusters, each described as "The applications running on this
system are typically large-scale scientific simulations or
visualization applications. +
3 ISPs, 1 HW failure log, 1 warranty service log of hardware
failures, and 1 exclusively FC HD set based on 4 different kinds of FC HDs.


>>In the case of the CMU study, people are even extrapolating an
>>economic conclusion the original author did not even make or intend!
>>Is it any wonder I'm expressing concern regarding inappropriate
>>extrapolation of those studies?
>
>I missed the posts where people were extrapolating economic
>conclusions, what I saw was people stateing that 'you better buy the
>SCSI drives as they are more reliable', and other people pointing
>out that recent studies indicate that there's not a significant
>difference in drive reliability between the two types of drives
The original poster asked a simple question regarding 8 SCSI HDs vs
24 SATA HDs.  That question was answered definitively some posts ago
(use 24 SATA HDs).

Once this thread started talking about the Google and CMU studies, it
expanded beyond the OPs original SCSI vs SATA question.
(else why are we including FC and other issues in our considerations
as in the CMU study?)

We seem to have evolved to
"Does paying more for enterprise class HDs vs consumer class HDs
result in enough of a quality difference to be worth it?"

To analyze that question, the only two HD metrics that should be
considered are
1= whether the vendor rates the HD as "enterprise" or not, and
2= the length of the warranty on the HD in question.
Otherwise, one risks clouding the analysis due to the costs of the
interface used.
(there are plenty of non HD metrics that need to be considered to
examine the issue properly.)

The CMU study was not examining any economic issue, and therefore to
draw an economic conclusion from it is questionable.
The CMU study was about whether the industry standard failure model
matched empirical historical evidence.
Using the CMU study for any other purpose risks misjudgment.


>>Let's pretend =You= get to build Citibank's or Humana's next
>>mission critical production DBMS using exclusively HDs with 1 year warranties.
>>(never would be allowed ITRW)
>
>who is arguing that you should use drives with 1 year warranties? in
>case you blinked consumer drive warranties are backup to 5 years.
As Josh Drake has since posted, they are not (although TBF most seem
to be greater than 1 year at this point).

So can I safely assume that we have agreement that you would not
advise using HDs with less than 5 year warranties for any DBMS?
If so, the only debate point left is whether there is a meaningful
distinction between HDs rated as "enterprise class" vs others by the
same vendor within the same generation.


>>Even if you RAID 6 them, I'll bet you anything that a system with
>>32+ HDs on it is likely enough to spend a high enough percentage of
>>its time operating in degraded mode that you are likely to be
>>looking for a job as a consequence of such a decision.
>>...and if you actually suffer data loss or, worse, data corruption,
>>that's a Career Killing Move.
>>(and it should be given the likely consequences to the public of
>>such a F* up).
>
>so now it's "nobody got fired for buying SCSI?"
|
Again, we are way past simply SCSI vs SATA interfaces issues and well
into more fundamental issues of HD quality and price.

Let's bear in mind that SCSI is =a legacy technology=.  Seagate will
cease making all SCSI HDs in 2007.  The SCSI standard has been
stagnant and obsolescent for years.  Frankly, the failure of the FC
vendors to come out with 10Gb FC in a timely fashion has probably
killed that interface as well.

The future is most likely SATA vs SAS.  =Those= are most likely the
relevant long-term technologies in this discussion.


>frankly, I think that a lot of the cost comes from the simple fact
>that they use smaller SCSI drives (most of them haven't starting
>useing 300G drives yet), and so they end up needing ~5x more drive
>bays, power, cooling, cableing, ports on the controllers, etc. if
>you need 5x the number of drives and they each cost 3x as much, you
>are already up to 15x price multiplier, going from there to 50x is
>only adding another 3x multiplier (which with the extra complexity
>of everything is easy to see, and almost seems reasonable)
|
Well, be prepared to re-examine this issue when you have to consider
using 2.5" 73GB SAS HDs vs using 3.5" >= 500GB SATA HDs.

For OLTP-like workloads, there is a high likelihood that solutions
involving more spindles are going to be better than those involving
fewer spindles.

Reliability isn't the only metric of consideration here.  If
organizations have to go certain routes to meet their business goals,
their choices are legitimately constrained.
(I recall being asked for a 10TB OLAP system 7 years ago and telling
the client for that point in time the only DBMS products that could
be trusted with that task were DB2 and Oracle: an answer the M$
favoring CEO of the client did !not! like.)


>if I had the money to waste, I would love to see someone open the
>'consumer grade' seagate Barracuda 7200.10 750G drive along with a
>'enterprise grade' seagate Barracuda ES 750G drive (both of which
>have 5 year warranties) to see if there is still the same 'dramatic
>difference' between consumer and enterprise drives that there used to be.
>
>it would also be interesting to compare the high-end scsi drives
>with the recent SATA/IDE drives. I'll have to look and see if I can
>catch some dead drives before they get destroyed and open them up.
I have to admit I haven't done this experiment in a few years
either.  When I did, there always was a notable difference (in
keeping with the vendor's claims as such)


This thread is not about whether there is a difference worthy of note
between anecdotal opinion, professional advice, and the results of studies.
I've made my POV clear on that topic and if there is to be a more
thorough analysis or discussion of it, it properly belongs in another thread.


Cheers,
Ron Peacetree


From:
"James Mansion"
Date:

>Logic?

Foul!  That's NOT evidence.

>
>Mechanical devices have decreasing MTBF when run in hotter environments,
>often at non-linear rates.

I agree that this seems intuitive.  But I think taking it as a cast-iron
truth is dangerous.

>Server class drives are designed with a longer lifespan in mind.

Evidence?

>Server class hard drives are rated at higher temperatures than desktop
>drives.
>
>Google can supply any numbers to fill those facts in, but I found a
>dozen or so data sheets for various enterprise versus desktop drives in
>a matter of minutes.

I know what the marketing info says, that's not the point.  Bear in mind
that these are somewhat designed to justify very much higher prices.

I'm looking for statistical evidence that the difference is there, not
marketing colateral.  They may be designed to be more reliable.  And
the design targets *that the manufacturer admits to* may be more
stringent, but I'm interested to know what the actual measured difference
is.

From the sound of it, you DON'T have such evidence.  Which is not a
surprise, because I don't have it either, and I do try to keep my eyes
open for it.

James




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.0.0/751 - Release Date: 07/04/2007
22:57