On Mon, 2009-09-14 at 22:58 -0700, John R Pierce wrote:
> and, if you're doing RAID with desktop grade disks, its quite possible
> for the drive to spontaneously decide a sector error requires a data
> relocation but not have the 'good' data to relocate, and not return an
> error code in time for the RAID controller or host md-raid to do
> anything about it. this results in a very sneaky sort of data
> corruption which goes undetected until some time later.
>
>
> this is the primary reason to use the premium "ES" grade SATA drives
> rather than the cheaper desktop stuff in a raid, they return sector
> errors in a timely fashion rather than retrying for many minutes in the
> background.
Ugh, really?
What do the desktop drives return in the mean time, when they haven't
been able to read a sector properly? Make something up and hope it gets
written to soon? That seems too hacky even for desktop HDD firmware,
which is saying something.
I've generally seen fairly prompt failure responses from desktop-grade
drives (and I see a lot of them fail!). While there are usually many
layers of OS-driven retries above the drive that delay reporting of
errors, the RAID volume the drive is a member of will generally block
until a retry succeeds or the OS layers between the software RAID
implementation and the disk give up and pass on the disk's error report.
That said, I've mostly used Linux's `md' software RAID, which while
imperfect seems to be pretty sane in terms of data preservation.
--
Craig Ringer