On Sat, 2009-07-11 at 18:19 -0700, Dirk Riehle wrote:
> I do have some weird every few days error where the soft raid blocks for
> a couple of seconds and I get this kernel log output:
>
> Jul 7 19:58:55 server kernel: [40336.000239] ata1.00: status: { DRDY }
> Jul 7 19:58:55 server kernel: [40336.000244] ata1.00: cmd
> 61/08:a0:a7:44:21/00:00:00:00:00/40 tag 20 ncq 4096 out
> Jul 7 19:58:55 server kernel: [40336.000245] res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Have you used smartctl (from the smartmontools package - on
Debian/Ubuntu at least) to examine the drive?
In particular, you should ask the drive to do a self-test and media
scan. This will not prevent take it out of the RAID or prevent it from
servicing normal operations, though it may slow it down a bit. Run:
smartctl -d ata -t long /dev/sda
then "sleep" however long it says the test will take, eg "sleep 2h".
When the sleep command exits, run:
smartctl -d ata -a /dev/sda
to see general info on the drive, its error logs, and its test logs. If
you see errors logged on the drive, if the test shows as failed, if you
see a non-zero "reallocated sector" count, or if "pending sector" is
non-zero, then it's time to replace the drive.
--
Craig Ringer