Thread: setting up raid10 with more than 4 drives

setting up raid10 with more than 4 drives

From
"Rajesh Kumar Mallah"
Date:
hi,

this is not really postgresql specific, but any help is appreciated.
i have read more spindles the better it is for IO performance.

suppose i have 8 drives , should a stripe (raid0) be created on
2 mirrors (raid1) of 4 drives each OR  should a stripe on 4 mirrors
of 2 drives each be created  ?

also does single channel  or dual channel controllers makes lot
of difference in raid10 performance ?

regds
mallah.

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
Stripe of mirrors is preferred to mirror of stripes for the best balance of
protection and performance.

In the stripe of mirrors you can lose up to half of the disks and still be
operational.  In the mirror of stripes, the most you could lose is two
drives.  The performance of the two should be similar - perhaps the seek
performance would be different for high concurrent use in PG.

- Luke


On 5/29/07 2:14 PM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:

> hi,
>
> this is not really postgresql specific, but any help is appreciated.
> i have read more spindles the better it is for IO performance.
>
> suppose i have 8 drives , should a stripe (raid0) be created on
> 2 mirrors (raid1) of 4 drives each OR  should a stripe on 4 mirrors
> of 2 drives each be created  ?
>
> also does single channel  or dual channel controllers makes lot
> of difference in raid10 performance ?
>
> regds
> mallah.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>



Re: setting up raid10 with more than 4 drives

From
"Rajesh Kumar Mallah"
Date:
On 5/30/07, Luke Lonergan <llonergan@greenplum.com> wrote:
> Stripe of mirrors is preferred to mirror of stripes for the best balance of
> protection and performance.

nooo! i am not aksing raid10 vs raid01 . I am considering stripe of
mirrors only.
the question is how are more number of disks supposed to be
BEST utilized in terms of IO performance  for

1. for adding more mirrored stripes OR
2. for adding more harddrives to the mirrors.

say i had 4 drives in raid10 format

D1  raid1  D2 --> MD0
D3  raid1  D4 --> MD1
MD0  raid0 MD1  --> MDF (final)

now i get 2 drives D5 and D6 the i got 2 options

1.  create a new mirror
D5 raid1 D6 --> MD2
MD0 raid0 MD1 raid0 MD2  --> MDF final


OR

D1 raid1 D2 raid1 D5  --> MD0
D3 raid1 D4 raid1 D6  --> MD1
MD0 raid0 MD1  --> MDF (final)

thanks , hope my question is clear now.


Regds
mallah.




>
> In the stripe of mirrors you can lose up to half of the disks and still be
> operational.  In the mirror of stripes, the most you could lose is two
> drives.  The performance of the two should be similar - perhaps the seek
> performance would be different for high concurrent use in PG.
>
> - Luke
>
>
> On 5/29/07 2:14 PM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:
>
> > hi,
> >
> > this is not really postgresql specific, but any help is appreciated.
> > i have read more spindles the better it is for IO performance.
> >
> > suppose i have 8 drives , should a stripe (raid0) be created on
> > 2 mirrors (raid1) of 4 drives each OR  should a stripe on 4 mirrors
> > of 2 drives each be created  ?
> >
> > also does single channel  or dual channel controllers makes lot
> > of difference in raid10 performance ?
> >
> > regds
> > mallah.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> >
>
>
>

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
Hi Rajesh,

On 5/29/07 7:18 PM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:

> D1 raid1 D2 raid1 D5  --> MD0
> D3 raid1 D4 raid1 D6  --> MD1
> MD0 raid0 MD1  --> MDF (final)

AFAIK you can't RAID1 more than two drives, so the above doesn't make sense
to me.

- Luke



Re: setting up raid10 with more than 4 drives

From
Stephen Frost
Date:
* Luke Lonergan (llonergan@greenplum.com) wrote:
> Hi Rajesh,
>
> On 5/29/07 7:18 PM, "Rajesh Kumar Mallah" <mallah.rajesh@gmail.com> wrote:
>
> > D1 raid1 D2 raid1 D5  --> MD0
> > D3 raid1 D4 raid1 D6  --> MD1
> > MD0 raid0 MD1  --> MDF (final)
>
> AFAIK you can't RAID1 more than two drives, so the above doesn't make sense
> to me.

It's just more copies of the same data if it's really a RAID1, for the
extra, extra paranoid.  Basically, in the example above, I'd read it as
"D1, D2, D5 have identical data on them".

    Thanks,

        Stephen

Attachment

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
Stephen,

On 5/29/07 8:31 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

> It's just more copies of the same data if it's really a RAID1, for the
> extra, extra paranoid.  Basically, in the example above, I'd read it as
> "D1, D2, D5 have identical data on them".

In that case, I'd say it's a waste of disk to add 1+2 redundancy to the
mirrors.

- Luke



Re: setting up raid10 with more than 4 drives

From
"Jonah H. Harris"
Date:
On 5/29/07, Luke Lonergan <llonergan@greenplum.com> wrote:
> AFAIK you can't RAID1 more than two drives, so the above doesn't make sense
> to me.

Yeah, I've never seen a way to RAID-1 more than 2 drives either.  It
would have to be his first one:

D1 + D2 = MD0 (RAID 1)
D3 + D4 = MD1 ...
D5 + D6 = MD2 ...
MD0 + MD1 + MD2 = MDF (RAID 0)

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

Re: setting up raid10 with more than 4 drives

From
david@lang.hm
Date:
On Wed, 30 May 2007, Jonah H. Harris wrote:

> On 5/29/07, Luke Lonergan <llonergan@greenplum.com> wrote:
>>  AFAIK you can't RAID1 more than two drives, so the above doesn't make
>>  sense
>>  to me.
>
> Yeah, I've never seen a way to RAID-1 more than 2 drives either.  It
> would have to be his first one:
>
> D1 + D2 = MD0 (RAID 1)
> D3 + D4 = MD1 ...
> D5 + D6 = MD2 ...
> MD0 + MD1 + MD2 = MDF (RAID 0)
>

I don't know what the failure mode ends up being, but on linux I had no
problems creating what appears to be a massively redundant (but small) array

md0 : active raid1 sdo1[10](S) sdn1[8] sdm1[7] sdl1[6] sdk1[5] sdj1[4] sdi1[3] sdh1[2] sdg1[9] sdf1[1] sde1[11](S)
sdd1[0]
       896 blocks [10/10] [UUUUUUUUUU]

David Lang

Re: setting up raid10 with more than 4 drives

From
"Peter Childs"
Date:


On 30/05/07, david@lang.hm <david@lang.hm> wrote:
On Wed, 30 May 2007, Jonah H. Harris wrote:

> On 5/29/07, Luke Lonergan <llonergan@greenplum.com> wrote:
>>  AFAIK you can't RAID1 more than two drives, so the above doesn't make
>>  sense
>>  to me.
>
> Yeah, I've never seen a way to RAID-1 more than 2 drives either.  It
> would have to be his first one:
>
> D1 + D2 = MD0 (RAID 1)
> D3 + D4 = MD1 ...
> D5 + D6 = MD2 ...
> MD0 + MD1 + MD2 = MDF (RAID 0)
>

I don't know what the failure mode ends up being, but on linux I had no
problems creating what appears to be a massively redundant (but small) array

md0 : active raid1 sdo1[10](S) sdn1[8] sdm1[7] sdl1[6] sdk1[5] sdj1[4] sdi1[3] sdh1[2] sdg1[9] sdf1[1] sde1[11](S) sdd1[0]
       896 blocks [10/10] [UUUUUUUUUU]

David Lang


Good point, also if you had Raid 1 with 3 drives with some bit errors at least you can take a vote on whats right. Where as if you only have 2 and they disagree how do you know which is right other than pick one and hope... But whatever it will be slower to keep in sync on a heavy write system.

Peter.

Re: setting up raid10 with more than 4 drives

From
Stephen Frost
Date:
* Peter Childs (peterachilds@gmail.com) wrote:
> Good point, also if you had Raid 1 with 3 drives with some bit errors at
> least you can take a vote on whats right. Where as if you only have 2 and
> they disagree how do you know which is right other than pick one and hope...
> But whatever it will be slower to keep in sync on a heavy write system.

I'm not sure, but I don't think most RAID1 systems do reads against all
drives and compare the results before returning it to the caller...  I'd
be curious if I'm wrong.

    Thanks,

        Stephen

Attachment

Re: setting up raid10 with more than 4 drives

From
Gregory Stark
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:

> On 5/29/07, Luke Lonergan <llonergan@greenplum.com> wrote:
>> AFAIK you can't RAID1 more than two drives, so the above doesn't make sense
>> to me.

Sure you can. In fact it's a very common backup strategy. You build a
three-way mirror and then when it comes time to back it up you break it into a
two-way mirror and back up the orphaned array at your leisure. When it's done
you re-add it and rebuild the degraded array. Good raid controllers can
rebuild the array at low priority squeezing in the reads in idle cycles.

I don't think you normally do it for performance though since there's more to
be gained by using larger stripes. In theory you should get the same boost on
reads as widening your stripes but of course you get no benefit on writes. And
I'm not sure raid controllers optimize raid1 accesses well in practice either.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
Hi Peter,

On 5/30/07 12:29 AM, "Peter Childs" <peterachilds@gmail.com> wrote:

> Good point, also if you had Raid 1 with 3 drives with some bit errors at least
> you can take a vote on whats right. Where as if you only have 2 and they
> disagree how do you know which is right other than pick one and hope... But
> whatever it will be slower to keep in sync on a heavy write system.

Much better to get a RAID system that checksums blocks so that "good" is
known.  Solaris ZFS does that, as do high end systems from EMC and HDS.

- Luke



Re: setting up raid10 with more than 4 drives

From
Michael Stone
Date:
On Wed, May 30, 2007 at 07:06:54AM -0700, Luke Lonergan wrote:
>On 5/30/07 12:29 AM, "Peter Childs" <peterachilds@gmail.com> wrote:
>> Good point, also if you had Raid 1 with 3 drives with some bit errors at least
>> you can take a vote on whats right. Where as if you only have 2 and they
>> disagree how do you know which is right other than pick one and hope... But
>> whatever it will be slower to keep in sync on a heavy write system.
>
>Much better to get a RAID system that checksums blocks so that "good" is
>known.  Solaris ZFS does that, as do high end systems from EMC and HDS.

I don't see how that's better at all; in fact, it reduces to exactly the
same problem: given two pieces of data which disagree, which is right?
The ZFS hashes do a better job of error detection, but that's still not
the same thing as a voting system (3 copies, 2 of 3 is correct answer)
to resolve inconsistencies.

Mike Stone

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
> I don't see how that's better at all; in fact, it reduces to
> exactly the same problem: given two pieces of data which
> disagree, which is right?

The one that matches the checksum.

- Luke


Re: setting up raid10 with more than 4 drives

From
Michael Stone
Date:
On Wed, May 30, 2007 at 10:36:48AM -0400, Luke Lonergan wrote:
>> I don't see how that's better at all; in fact, it reduces to
>> exactly the same problem: given two pieces of data which
>> disagree, which is right?
>
>The one that matches the checksum.

And you know the checksum is good, how?

Mike Stone

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:

It's created when the data is written to both drives.

This is standard stuff, very well proven: try googling 'self healing zfs'.

- Luke

Msg is shrt cuz m on ma treo

 -----Original Message-----
From:   Michael Stone [mailto:mstone+postgres@mathom.us]
Sent:   Wednesday, May 30, 2007 11:11 AM Eastern Standard Time
To:     pgsql-performance@postgresql.org
Subject:        Re: [PERFORM] setting up raid10 with more than 4 drives

On Wed, May 30, 2007 at 10:36:48AM -0400, Luke Lonergan wrote:
>> I don't see how that's better at all; in fact, it reduces to
>> exactly the same problem: given two pieces of data which
>> disagree, which is right? 
>
>The one that matches the checksum.

And you know the checksum is good, how?

Mike Stone

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: setting up raid10 with more than 4 drives

From
Gregory Stark
Date:
"Michael Stone" <mstone+postgres@mathom.us> writes:

"Michael Stone" <mstone+postgres@mathom.us> writes:

> On Wed, May 30, 2007 at 07:06:54AM -0700, Luke Lonergan wrote:
>
> > Much better to get a RAID system that checksums blocks so that "good" is
> > known. Solaris ZFS does that, as do high end systems from EMC and HDS.
>
> I don't see how that's better at all; in fact, it reduces to exactly the same
> problem: given two pieces of data which disagree, which is right?

Well, the one where the checksum is correct.

In practice I've never seen a RAID failure due to outright bad data. In my
experience when a drive goes bad it goes really bad and you can't read the
block at all without i/o errors.

In every case where I've seen bad data it was due to bad memory (in one case
bad memory in the RAID controller cache -- that was hell to track down).
Checksums aren't even enough in that case as you'll happily generate a
checksum for the bad data before storing it...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: setting up raid10 with more than 4 drives

From
PFC
Date:
On Wed, 30 May 2007 16:36:48 +0200, Luke Lonergan
<LLonergan@greenplum.com> wrote:

>> I don't see how that's better at all; in fact, it reduces to
>> exactly the same problem: given two pieces of data which
>> disagree, which is right?
>
> The one that matches the checksum.

    - postgres tells OS "write this block"
    - OS sends block to drives A and B
    - drive A happens to be lucky and seeks faster, writes data
    - student intern carrying pizzas for senior IT staff trips over power
cord*
    - boom
    - drive B still has old block

    Both blocks have correct checksum, so only a version counter/timestamp
could tell.
    Fortunately if fsync() is honored correctly (did you check ?) postgres
will zap such errors in recovery.

    Smart RAID1 or 0+1 controllers (including software RAID) will distribute
random reads to both disks (but not writes obviously).

    * = this happened at my old job, yes they had a very frightening server
room, or more precisely "cave" ; I never went there, I didn't want to be
the one fired for tripping over the wire...


    From Linux Software RAID howto :

    - benchmarking (quite brief !)
    http://unthought.net/Software-RAID.HOWTO/Software-RAID.HOWTO-9.html#ss9.5

    - read "Data Scrubbing" here :
http://gentoo-wiki.com/HOWTO_Install_on_Software_RAID

    - yeah but does it work ? (scary)
http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=405919

  md/sync_action
       This can be used to monitor and control the resync/recovery
       process of MD. In particular, writing "check" here will cause
       the array to read all data block and check that they are
       consistent (e.g. parity is correct, or all mirror replicas are
       the same). Any discrepancies found are NOT corrected.

       A count of problems found will be stored in md/mismatch_count.

       Alternately, "repair" can be written which will cause the same
       check to be performed, but any errors will be corrected.

Re: setting up raid10 with more than 4 drives

From
PFC
Date:
    Oh by the way, I saw a nifty patch in the queue :

Find a way to reduce rotational delay when repeatedly writing last WAL page
Currently fsync of WAL requires the disk platter to perform a full
rotation to fsync again.
One idea is to write the WAL to different offsets that might reduce the
rotational delay.

    This will not work if the WAL is on RAID1, because two disks never spin
exactly at the same speed...

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:

> This is standard stuff, very well proven: try googling 'self healing zfs'.

The first hit on this search is a demo of ZFS detecting corruption of one of the mirror pair using checksums, very cool:
  http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508D464883F194061E341F58F4E7E1

The bad drive is pointed out directly using the checksum and the data integrity is preserved.

- Luke

Re: setting up raid10 with more than 4 drives

From
mark@mark.mielke.cc
Date:
On Wed, May 30, 2007 at 08:51:45AM -0700, Luke Lonergan wrote:
> > This is standard stuff, very well proven: try googling 'self healing zfs'.
> The first hit on this search is a demo of ZFS detecting corruption of one of
> the mirror pair using checksums, very cool:
>
> http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508
> D464883F194061E341F58F4E7E1
>
> The bad drive is pointed out directly using the checksum and the data
> integrity is preserved.

One part is corruption. Another is ordering and consistency. ZFS represents
both RAID-style storage *and* journal-style file system. I imagine consistency
and ordering is handled through journalling.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: setting up raid10 with more than 4 drives

From
"Rajesh Kumar Mallah"
Date:
Sorry for posting and disappearing.

i am still not clear what is the best way of throwing in more
disks into the system.
does more stripes means more performance (mostly) ?
also is there any thumb rule about best stripe size ? (8k,16k,32k...)

regds
mallah



On 5/30/07, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
> On Wed, May 30, 2007 at 08:51:45AM -0700, Luke Lonergan wrote:
> > > This is standard stuff, very well proven: try googling 'self healing zfs'.
> > The first hit on this search is a demo of ZFS detecting corruption of one of
> > the mirror pair using checksums, very cool:
> >
> > http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508
> > D464883F194061E341F58F4E7E1
> >
> > The bad drive is pointed out directly using the checksum and the data
> > integrity is preserved.
>
> One part is corruption. Another is ordering and consistency. ZFS represents
> both RAID-style storage *and* journal-style file system. I imagine consistency
> and ordering is handled through journalling.
>
> Cheers,
> mark
>
> --
> mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
> .  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
> |\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
> |  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
>
>   One ring to rule them all, one ring to find them, one ring to bring them all
>                        and in the darkness bind them...
>
>                            http://mark.mielke.cc/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: setting up raid10 with more than 4 drives

From
"Luke Lonergan"
Date:
Mark,

On 5/30/07 8:57 AM, "mark@mark.mielke.cc" <mark@mark.mielke.cc> wrote:

> One part is corruption. Another is ordering and consistency. ZFS represents
> both RAID-style storage *and* journal-style file system. I imagine consistency
> and ordering is handled through journalling.

Yep and versioning, which answers PFC's scenario.

Short answer: ZFS has a very reliable model that uses checksumming and
journaling along with block versioning to implement "self healing".  There
are others that do some similar things with checksumming on the SAN HW and
cooperation with the filesystem.

- Luke



Re: setting up raid10 with more than 4 drives

From
mark@mark.mielke.cc
Date:
On Thu, May 31, 2007 at 01:28:58AM +0530, Rajesh Kumar Mallah wrote:
> i am still not clear what is the best way of throwing in more
> disks into the system.
> does more stripes means more performance (mostly) ?
> also is there any thumb rule about best stripe size ? (8k,16k,32k...)

It isn't that simple. RAID1 should theoretically give you the best read
performance. If all you care about is read, then "best performance" would
be to add more mirrors to your array.

For write performance, RAID0 is the best. I think this is what you mean
by "more stripes".

This is where RAID 1+0/0+1 come in. To reconcile the above. Your question
seems to be: I have a RAID 1+0/0+1 system. Should I add disks onto the 0
part of the array? Or the 1 part of the array?

My conclusion to you would be: Both, unless you are certain that you load
is scaled heavily towards read, in which case the 1, or if scaled heavily
towards write, then 0.

Then comes the other factors. Do you want redundancy? Then you want 1.
Do you want capacity? Then you want 0.

There is no single answer for most people.

For me, stripe size is the last decision to make, and may be heavily
sensitive to load patterns. This suggests a trial and error / benchmarking
requirement to determine the optimal stripe size for your use.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: setting up raid10 with more than 4 drives

From
"Rajesh Kumar Mallah"
Date:
On 5/31/07, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
> On Thu, May 31, 2007 at 01:28:58AM +0530, Rajesh Kumar Mallah wrote:
> > i am still not clear what is the best way of throwing in more
> > disks into the system.
> > does more stripes means more performance (mostly) ?
> > also is there any thumb rule about best stripe size ? (8k,16k,32k...)
>
> It isn't that simple. RAID1 should theoretically give you the best read
> performance. If all you care about is read, then "best performance" would
> be to add more mirrors to your array.
>
> For write performance, RAID0 is the best. I think this is what you mean
> by "more stripes".
>
> This is where RAID 1+0/0+1 come in. To reconcile the above. Your question
> seems to be: I have a RAID 1+0/0+1 system. Should I add disks onto the 0
> part of the array? Or the 1 part of the array?

> My conclusion to you would be: Both, unless you are certain that you load
> is scaled heavily towards read, in which case the 1, or if scaled heavily
> towards write, then 0.

thanks . this answers to my query. all the time i was thinking of 1+0
only failing to observe the 0+1 part in it.

>
> Then comes the other factors. Do you want redundancy? Then you want 1.
> Do you want capacity? Then you want 0.

Ok.

>
> There is no single answer for most people.
>
> For me, stripe size is the last decision to make, and may be heavily
> sensitive to load patterns. This suggests a trial and error / benchmarking
> requirement to determine the optimal stripe size for your use.

thanks.
mallah.

>
> Cheers,
> mark
>
> --
> mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
> .  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
> |\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
> |  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
>
>   One ring to rule them all, one ring to find them, one ring to bring them all
>                        and in the darkness bind them...
>
>                            http://mark.mielke.cc/
>
>

Re: setting up raid10 with more than 4 drives

From
"Steinar H. Gunderson"
Date:
On Wed, May 30, 2007 at 12:41:46AM -0400, Jonah H. Harris wrote:
> Yeah, I've never seen a way to RAID-1 more than 2 drives either.

pannekake:~> grep -A 1 md0 /proc/mdstat
md0 : active raid1 dm-20[2] dm-19[1] dm-18[0]
      64128 blocks [3/3] [UUU]

It's not a big device, but I can ensure you it exists :-)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: setting up raid10 with more than 4 drives

From
Sander Steffann
Date:
Hi,

Op 1-jun-2007, om 1:39 heeft Steinar H. Gunderson het volgende
geschreven:
> On Wed, May 30, 2007 at 12:41:46AM -0400, Jonah H. Harris wrote:
>> Yeah, I've never seen a way to RAID-1 more than 2 drives either.
>
> pannekake:~> grep -A 1 md0 /proc/mdstat
> md0 : active raid1 dm-20[2] dm-19[1] dm-18[0]
>       64128 blocks [3/3] [UUU]
>
> It's not a big device, but I can ensure you it exists :-)

I talked to someone yesterday who did a 10 or 11 way RAID1 with Linux
MD for high performance video streaming. Seemed to work very well.

- Sander


Autodetect of software RAID1+0 fails

From
Craig James
Date:
Apologies for a somewhat off-topic question, but...

The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.  It detects the two RAID1 arrays, the
partitionsof which are marked properly.  But it can't find the RAID0 on top of that, because there's no corresponding
deviceto auto-detect.  The result is that it creates /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup,
but/dev/md2 isn't created, so the RAID0 can't be assembled at boot time. 

Here's what it looks like:

$ cat /proc/mdstat
Personalities : [raid0] [raid1]
md2 : active raid0 md0[0] md1[1]
      234436224 blocks 64k chunks

md1 : active raid1 sde1[1] sdc1[2]
      117218176 blocks [2/2] [UU]

md0 : active raid1 sdd1[1] sdb1[0]
      117218176 blocks [2/2] [UU]

$ uname -r
2.6.12-1.1381_FC3

After a reboot, I always have to do this:

      mknod /dev/md2 b 9 2
      mdadm --assemble /dev/md2 /dev/md0 /dev/md1
      mount /dev/md2

What am I missing here?

Thanks,
Craig

Re: Autodetect of software RAID1+0 fails

From
Dimitri
Date:
Craig,

to make things working properly here you need to create a config file
keeping both raid1 and raid0 information (/etc/mdadm/mdadm.conf).
However if your root filesystem is corrupted, or you loose this file,
or move disks somewhere else - you are back to the same initial issue
:))

So, the solution I've found 100% working in any case is: use mdadm to
create raid1 devices (as you do already) and then use LVM to create
raid0 volume on it - LVM writes its own labels on every MD devices and
will find its volumes peaces automatically! Tested for crash several
times and was surprised by its robustness :))

Rgds,
-Dimitri

On 6/1/07, Craig James <craig_james@emolecules.com> wrote:
> Apologies for a somewhat off-topic question, but...
>
> The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.
>  It detects the two RAID1 arrays, the partitions of which are marked
> properly.  But it can't find the RAID0 on top of that, because there's no
> corresponding device to auto-detect.  The result is that it creates /dev/md0
> and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't
> created, so the RAID0 can't be assembled at boot time.
>
> Here's what it looks like:
>
> $ cat /proc/mdstat
> Personalities : [raid0] [raid1]
> md2 : active raid0 md0[0] md1[1]
>       234436224 blocks 64k chunks
>
> md1 : active raid1 sde1[1] sdc1[2]
>       117218176 blocks [2/2] [UU]
>
> md0 : active raid1 sdd1[1] sdb1[0]
>       117218176 blocks [2/2] [UU]
>
> $ uname -r
> 2.6.12-1.1381_FC3
>
> After a reboot, I always have to do this:
>
>       mknod /dev/md2 b 9 2
>       mdadm --assemble /dev/md2 /dev/md0 /dev/md1
>       mount /dev/md2
>
> What am I missing here?
>
> Thanks,
> Craig
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: Autodetect of software RAID1+0 fails

From
"Luke Lonergan"
Date:
Dimitri,

LVM is great, one thing to watch out for: it is very slow compared to pure
md.  That will only matter in practice if you want to exceed 1GB/s of
sequential I/O bandwidth.

- Luke


On 6/1/07 11:51 AM, "Dimitri" <dimitrik.fr@gmail.com> wrote:

> Craig,
>
> to make things working properly here you need to create a config file
> keeping both raid1 and raid0 information (/etc/mdadm/mdadm.conf).
> However if your root filesystem is corrupted, or you loose this file,
> or move disks somewhere else - you are back to the same initial issue
> :))
>
> So, the solution I've found 100% working in any case is: use mdadm to
> create raid1 devices (as you do already) and then use LVM to create
> raid0 volume on it - LVM writes its own labels on every MD devices and
> will find its volumes peaces automatically! Tested for crash several
> times and was surprised by its robustness :))
>
> Rgds,
> -Dimitri
>
> On 6/1/07, Craig James <craig_james@emolecules.com> wrote:
>> Apologies for a somewhat off-topic question, but...
>>
>> The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.
>>  It detects the two RAID1 arrays, the partitions of which are marked
>> properly.  But it can't find the RAID0 on top of that, because there's no
>> corresponding device to auto-detect.  The result is that it creates /dev/md0
>> and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't
>> created, so the RAID0 can't be assembled at boot time.
>>
>> Here's what it looks like:
>>
>> $ cat /proc/mdstat
>> Personalities : [raid0] [raid1]
>> md2 : active raid0 md0[0] md1[1]
>>       234436224 blocks 64k chunks
>>
>> md1 : active raid1 sde1[1] sdc1[2]
>>       117218176 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdd1[1] sdb1[0]
>>       117218176 blocks [2/2] [UU]
>>
>> $ uname -r
>> 2.6.12-1.1381_FC3
>>
>> After a reboot, I always have to do this:
>>
>>       mknod /dev/md2 b 9 2
>>       mdadm --assemble /dev/md2 /dev/md0 /dev/md1
>>       mount /dev/md2
>>
>> What am I missing here?
>>
>> Thanks,
>> Craig
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>



Re: Autodetect of software RAID1+0 fails

From
"Steinar H. Gunderson"
Date:
On Fri, Jun 01, 2007 at 10:57:56AM -0700, Craig James wrote:
> The Linux kernel doesn't properly detect my software RAID1+0 when I boot
> up.  It detects the two RAID1 arrays, the partitions of which are marked
> properly.  But it can't find the RAID0 on top of that, because there's no
> corresponding device to auto-detect.  The result is that it creates
> /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but
> /dev/md2 isn't created, so the RAID0 can't be assembled at boot time.

Either do your md discovery in userspace via mdadm (your distribution can
probably help you with this), or simply use the raid10 module instead of
building raid1+0 yourself.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Autodetect of software RAID1+0 fails

From
mark@mark.mielke.cc
Date:
> On Fri, Jun 01, 2007 at 10:57:56AM -0700, Craig James wrote:
> > The Linux kernel doesn't properly detect my software RAID1+0 when I boot
> > up.  It detects the two RAID1 arrays, the partitions of which are marked
> > properly.  But it can't find the RAID0 on top of that, because there's no
> > corresponding device to auto-detect.  The result is that it creates
> > /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but
> > /dev/md2 isn't created, so the RAID0 can't be assembled at boot time.

Hi Craig:

I had the same problem for a short time. There *is* a device to base the
RAID0 off, however, it needs to be recursively detected. mdadm will do this
for you, however, if the device order isn't optimal, it may need some help
via /etc/mdadm.conf. For a while, I used something like:

DEVICE partitions
...
ARRAY /dev/md3 level=raid0 num-devices=2 UUID=10d58416:5cd52161:7703b48e:cd93a0e0
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=1515ac26:033ebf60:fa5930c5:1e1f0f12
ARRAY /dev/md6 level=raid1 num-devices=2 UUID=72ddd3b6:b063445c:d7718865:bb79aad7

My symptoms were that it worked where started from user space, but failed during
reboot without the above hints. I believe if I had defined md5 and md6 before
md3, it may have worked automatically without hints.

On Fri, Jun 01, 2007 at 11:35:01PM +0200, Steinar H. Gunderson wrote:
> Either do your md discovery in userspace via mdadm (your distribution can
> probably help you with this), or simply use the raid10 module instead of
> building raid1+0 yourself.

I agree with using the mdadm RAID10 support. RAID1+0 has the
flexibility of allowing you to fine-control the RAID1 vs RAID0 if you
want to add disks later. RAID10 from mdadm has the flexibility that
you don't need an even number of disks. As I don't intend to add disks
to my array - the RAID10 as a single layer, with potentially better
intelligence in terms of performance, appeals to me.

They both worked for me - but I am sticking with the single layer now.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Autodetect of software RAID1+0 fails

From
"Luke Lonergan"
Date:
Steinar,

On 6/1/07 2:35 PM, "Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:

> Either do your md discovery in userspace via mdadm (your distribution can
> probably help you with this), or simply use the raid10 module instead of
> building raid1+0 yourself.

I found md raid10 to be *very* slow compared to raid1+0 on Linux 2.6.9 ->
2.6.18.  Very slow in this case is < 400 MB/s compared to 1,800 MB/s.

- Luke