Thread: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?
Sorry guys, I know this is very off-track for this list, but google hasn't been of much help. This is my raid array on which my PG data resides. I have a 4 disk Raid10 array running on linux MD raid. Sda / sdb / sdc / sdd One fine day, 2 of the drives just suddenly decide to die on me. (sda and sdd) I've tried multiple methods to try to determine if I can get them back online. 1) replace sda w/ fresh drive and resync - Failed 2) replace sdd w/ fresh drive and resync - Failed 3) replace sda w/ fresh drive but keeping existing sdd and resync - Failed 4) replace sdd w/ fresh drive but keeping existing sda and resync - Failed Raid10 is supposed to be able to withstand up to 2 drive failures if the failures are from different sides of the mirror. Right now, I'm not sure which drive belongs to which. How do I determine that? Does it depend on the output of /prod/mdstat and in that order? Thanks
On Tue, Oct 20, 2009 at 1:11 AM, Ow Mun Heng <ow.mun.heng@wdc.com> wrote: > Sorry guys, I know this is very off-track for this list, but google hasn't > been of much help. This is my raid array on which my PG data resides. > > I have a 4 disk Raid10 array running on linux MD raid. > Sda / sdb / sdc / sdd > > One fine day, 2 of the drives just suddenly decide to die on me. (sda and > sdd) > > I've tried multiple methods to try to determine if I can get them back > online. > > 1) replace sda w/ fresh drive and resync - Failed > 2) replace sdd w/ fresh drive and resync - Failed > 3) replace sda w/ fresh drive but keeping existing sdd and resync - Failed > 4) replace sdd w/ fresh drive but keeping existing sda and resync - Failed > > > Raid10 is supposed to be able to withstand up to 2 drive failures if the > failures are from different sides of the mirror. > > Right now, I'm not sure which drive belongs to which. How do I determine > that? Does it depend on the output of /prod/mdstat and in that order? Is this software raid in linux? What does cat /proc/mdstat say?
On 20/10/2009 4:41 PM, Scott Marlowe wrote: >> I have a 4 disk Raid10 array running on linux MD raid. >> Sda / sdb / sdc / sdd >> >> One fine day, 2 of the drives just suddenly decide to die on me. (sda and >> sdd) >> >> I've tried multiple methods to try to determine if I can get them back >> online You made an exact image of each drive onto new, spare drives with `dd' or a similar disk imaging tool before trying ANYTHING, right? Otherwise, you may well have made things worse, particularly since you've tried to resync the array. Even if the data was recoverable before, it might not be now. How, exactly, have the drives failed? Are they totally dead, so that the BIOS / disk controller don't even see them? Can the partition tables be read? Does 'file -s /dev/sda' report any output? What's the output of: smartctl -d ata -a /dev/sda (repeat for sdd) ? If the problem is just a few bad sectors, you can usually just force-re-add the drives into the array and then copy the array contents to another drive either at a low level (with dd_rescue) or at a file system level. If the problem is one or more totally fried drives, where the drive is totally inaccessible or most of the data is hopelessly corrupt / unreadable, then you're in a lot more trouble. RAID 10 effectively stripes the data across the mirrored pairs, so if you lose a whole mirrored pair you've lost half the stripes. It's not that different from running paper through a shredder, discarding half the shreds, and lining it all back up. On a side note: I'm personally increasingly annoyed with the tendency of RAID controllers (and s/w raid implementations) to treat disks with unrepairable bad sectors as dead and fail them out of the array. That's OK if you have a hot spare and no other drive fails during rebuild, but it's just not good enough if failing that drive would result in the array going into failed state. Rather than failing a drive and as a result rendering the whole array unreadable in such situations, it should mark the drive defective, set the array to read-only, and start screaming for help. Way too much data gets murdered by RAID implementations removing mildly faulty drives from already-degraded arrays instead of just going read-only. -- Craig Ringer
On Tue, 20 Oct 2009, Ow Mun Heng wrote: > Raid10 is supposed to be able to withstand up to 2 drive failures if the > failures are from different sides of the mirror. Right now, I'm not > sure which drive belongs to which. How do I determine that? Does it > depend on the output of /prod/mdstat and in that order? You build a 4-disk RAID10 array on Linux by first building two RAID1 pairs, then striping both of the resulting /dev/mdX devices together via RAID0. You'll actually have 3 /dev/mdX devices around as a result. I suspect you're trying to execute mdadm operations on the outer RAID0, when what you actually should be doing is fixing the bottom-level RAID1 volumes. Unfortunately I'm not too optimistic about your case though, because if you had a repairable situation you technically shouldn't have lost the array in the first place--it should still be running, just in degraded mode on both underlying RAID1 halves. There's a good example of how to set one of these up http://www.sanitarium.net/golug/Linux_Software_RAID.html ; note how the RAID10 involves /dev/md{0,1,2,3} for the 6-disk volume. Here's what will probably show you the parts you're trying to figure out: mdadm --detail /dev/md0 mdadm --detail /dev/md1 mdadm --detail /dev/md2 That should give you an idea what md devices are hanging around and what's inside of them. One thing you don't see there is what devices were originally around if they've already failed. I highly recommend saving a copy of the mdadm detail (and "smartctl -i" for each underlying drive) on any production server, to make it easier to answer questions like "what's the serial number of the drive that failed in /dev/md0?". -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, Oct 21, 2009 at 12:10 AM, Greg Smith <gsmith@gregsmith.com> wrote: > On Tue, 20 Oct 2009, Ow Mun Heng wrote: > >> Raid10 is supposed to be able to withstand up to 2 drive failures if the >> failures are from different sides of the mirror. Right now, I'm not sure >> which drive belongs to which. How do I determine that? Does it depend on the >> output of /prod/mdstat and in that order? > > You build a 4-disk RAID10 array on Linux by first building two RAID1 pairs, > then striping both of the resulting /dev/mdX devices together via RAID0. Actually, later models of linux have a direct RAID-10 level built in. I haven't used it. Not sure how it would look in /proc/mdstat either. > You'll actually have 3 /dev/mdX devices around as a result. I suspect > you're trying to execute mdadm operations on the outer RAID0, when what you > actually should be doing is fixing the bottom-level RAID1 volumes. > Unfortunately I'm not too optimistic about your case though, because if you > had a repairable situation you technically shouldn't have lost the array in > the first place--it should still be running, just in degraded mode on both > underlying RAID1 halves. Exactly. Sounds like both drives in a pair failed.
On Tue, 20 Oct 2009, Craig Ringer wrote: > You made an exact image of each drive onto new, spare drives with `dd' > or a similar disk imaging tool before trying ANYTHING, right? Otherwise, > you may well have made things worse, particularly since you've tried to > resync the array. Even if the data was recoverable before, it might not > be now. This is actually pretty hard to screw up with Linux software RAID. It's not easy to corrupt a working volume by trying to add a bogus one or typing simple commands wrong. You'd have to botch the drive addition process altogether and screw with something else to take out a good drive. > If the problem is just a few bad sectors, you can usually just > force-re-add the drives into the array and then copy the array contents > to another drive either at a low level (with dd_rescue) or at a file > system level. This approach has saved me more than once. On the flip side, I have also more than once accidentally wiped out my only good copy of the data when making a mistake during an attempt at stressed out heroics like this. You certainly don't want to wander down this more complicated path if there's a simple fix available within the context of the standard tools for array repairs. > On a side note: I'm personally increasingly annoyed with the tendency of > RAID controllers (and s/w raid implementations) to treat disks with > unrepairable bad sectors as dead and fail them out of the array. Given how fast drives tend to go completely dead once the first error shows up, this is a reasonable policy in general. > Rather than failing a drive and as a result rendering the whole array > unreadable in such situations, it should mark the drive defective, set > the array to read-only, and start screaming for help. The idea is great, but you have to ask just exactly how the hardware and software involved is supposed to enforce making the array read-only. I don't think the ATA and similar command sets have that concept implemented in a way you can actually do this at the level it would need to happen at for hardware RAID to implement this idea. Linux software RAID could keep you from mounting the array read/write in this situation, but the way errors percolate up from the disk devices to the array ones in Linux has too many layers in it (especially if LVM is stuck in the middle there too) for that to be simple to implement either. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 21 Oct 2009, Scott Marlowe wrote: > Actually, later models of linux have a direct RAID-10 level built in. > I haven't used it. Not sure how it would look in /proc/mdstat either. I think I actively block memory of that because the UI on it is so cryptic and it's been historically much more buggy than the simpler RAID0/RAID1 implementaions. But you're right that it's completely possible Ow used it. Would explain not being able to figure out what's going on too. There's a good example of what the result looks like with failed drives in one of the many bug reports related to that feature at https://bugs.launchpad.net/ubuntu/intrepid/+source/linux/+bug/285156 and I liked the discussion of some of the details here at http://robbat2.livejournal.com/231207.html The other hint I forgot to mention is that you should try: mdadm --examine /dev/XXX For each of the drives that still works, to help figure out where they fit into the larger array. That and --detail are what I find myself using instead of /proc/mdstat , which provides an awful interface IMHO. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
-----Original Message----- From: Greg Smith [mailto:gsmith@gregsmith.com] On Wed, 21 Oct 2009, Scott Marlowe wrote: >> Actually, later models of linux have a direct RAID-10 level built in. >> I haven't used it. Not sure how it would look in /proc/mdstat either. >I think I actively block memory of that because the UI on it is so cryptic >and it's been historically much more buggy than the simpler RAID0/RAID1 >implementaions. But you're right that it's completely possible Ow used >it. Would explain not being able to figure out what's going on too. You're right, the newer linux all support raid10 by default and do not do the funky Raid1 first then raid0 stuffs combined. >There's a good example of what the result looks like with failed drives in >one of the many bug reports related to that feature at >https://bugs.launchpad.net/ubuntu/intrepid/+source/linux/+bug/285156 and I >liked the discussion of some of the details here at >http://robbat2.livejournal.com/231207.html I actually stumbled onto that (the 2nd link) and tried some of the methods, but it's actually kinda of outdated I think. > The other hint I forgot to mention is that you should try: > mdadm --examine /dev/XXX > For each of the drives that still works, to help figure out where they fit > into the larger array. That and --detail are what I find myself using > instead of /proc/mdstat , which provides an awful interface IMHO. That's one of the problem, I'm not exactly sure. Sda1 = 1 Sdb1 = 2 Sdc1 = 3 Sdd1 = 4 If they are following the sequence, and I'm losing sda1 and sdd1, I theoretically is supposed to be able to recover them, but I'm not getting much luck. FYI.. I've left the box as it is for now and have yet to connect it back up and all, hence, I can't really post the outputs of /proc/mdstat and --examine. But I will once I boot it up. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD