Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery? - Mailing list pgsql-general

From Greg Smith
Subject Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?
Date
Msg-id alpine.GSO.2.01.0910210155300.1418@westnet.com
Whole thread Raw
In response to OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?  ("Ow Mun Heng" <ow.mun.heng@wdc.com>)
Responses Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?
List pgsql-general
On Tue, 20 Oct 2009, Ow Mun Heng wrote:

> Raid10 is supposed to be able to withstand up to 2 drive failures if the
> failures are from different sides of the mirror.  Right now, I'm not
> sure which drive belongs to which. How do I determine that? Does it
> depend on the output of /prod/mdstat and in that order?

You build a 4-disk RAID10 array on Linux by first building two RAID1
pairs, then striping both of the resulting /dev/mdX devices together via
RAID0.  You'll actually have 3 /dev/mdX devices around as a result.  I
suspect you're trying to execute mdadm operations on the outer RAID0, when
what you actually should be doing is fixing the bottom-level RAID1
volumes.  Unfortunately I'm not too optimistic about your case though,
because if you had a repairable situation you technically shouldn't have
lost the array in the first place--it should still be running, just in
degraded mode on both underlying RAID1 halves.

There's a good example of how to set one of these up
http://www.sanitarium.net/golug/Linux_Software_RAID.html ; note how the
RAID10 involves /dev/md{0,1,2,3} for the 6-disk volume.

Here's what will probably show you the parts you're trying to figure out:

mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

That should give you an idea what md devices are hanging around and what's
inside of them.

One thing you don't see there is what devices were originally around if
they've already failed.  I highly recommend saving a copy of the mdadm
detail (and "smartctl -i" for each underlying drive) on any production
server, to make it easier to answer questions like "what's the serial
number of the drive that failed in /dev/md0?".

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Tatsuo Ishii
Date:
Subject: How much lines per day?
Next
From: Scott Marlowe
Date:
Subject: Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?