Gregory Youngblood wrote:
> Also, a very informative read:
> http://research.google.com/archive/disk_failures.pdf
> In short, best thing to do is watch SMART and be prepared to try and
> swap a drive out before it fails completely. :)
>
I currently have four brand new 1TB disks (7200RPM SATA - they're for
our backup server). Two of them make horrible clicking noises - they're
rapidly parking and unparking or doing constant seeks. One of those two
also spins up very loudly, and on spin down rattles and buzzes.
Their internal SMART "health check" reports the problem two to be just
fine, and both pass a short SMART self test (smartctl -d ata -t short).
Both have absurdly huge seek_error_rate values, but the SMART thresholds
see nothing wrong with this.
The noisy spin down one is so defective that I can't even write data to
it successfully, and the other problem disk has regular I/O errors and
fails an extended SMART self test (the short test fails).
I see this sort of thing regularly. Vendors are obviously setting the
SMART health thresholds so that there's absolutely no risk of reporting
an issue with a working drive, and in the process making it basically
useless for detecting failing or faulty drives.
I rely on manual examination of the vendor attributes like the seek
error rate, ECC recovered sectors, offline uncorrectable sectors
(usually a REALLY bad sign if this grows), etc combined with regular
extended SMART tests (which do a surface scan). Just using SMART - say,
the basic health check - really isn't enough.
--
Craig Ringer