Thread: PowerEdge 2950 questions

PowerEdge 2950 questions

From

Jeff Davis

Date:

22 August 2006, 18:34:26

This question is related to the thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00152.php
but I had some questions.

I am looking at setting up two general-purpose database servers,
replicated with Slony. Each server I'm looking at has the following
specs:

Dell PowerEdge 2950
- 2 x Dual Core Intel® Xeon® 5130, 4MB Cache, 2.00GHz, 1333MHZ FSB
- 4GB RAM
- PERC 5/i, x6 Backplane, Integrated Controller Card (256MB battery-
backed cache)
- 6 x 73GB, SAS, 3.5-inch, 15K RPM Hard Drive arranged in RAID 10

These servers are reasonably priced and so they seem like a good choice
for the overall price, and the above thread indicated good performance.
However, I want to make sure that putting WAL in with PGDATA on the
RAID-10 is wise. And if there are any other suggestions that would be
great. Is the RAID controller good? Are the processors good for database
work or are Opterons significantly better?

I may go for more storage as well (i.e. getting 300GB disks), but I am
still determining the potential need for storage. I can get more RAM at
a later date if necessary also.

Regards,
    Jeff Davis

Re: PowerEdge 2950 questions

From

"Bucky Jordan"

Date:

22 August 2006, 18:56:36

Hi Jeff,

My experience with the 2950 seemed to indicate that RAID10x6 disks did
not perform as well as RAID5x6. I believe I posted some numbers to
illustrate this in the post you mentioned.

If I remember correctly, the numbers were pretty close, but I was
expecting RAID10 to significantly beat RAID5. However, with 6 disks,
RAID5 starts performing a little better, and it also has good storage
utilization (i.e. you're only loosing 1 disk's worth of storage, so with
6 drives, you still have 83% - 5/6 - of your storage available, as
opposed to 50% with RAID10).

Keep in mind that with 6 disks, theoretically (your mileage may vary by
raid controller implementation) you have more fault tolerance with
RAID10 than with RAID5.

Also, I don't think there's a lot of performance gain to going with the
15k drives over the 10k. Even dell only says a 10% boost. I've
benchmarked a single drive configuration, 10k vs 15k rpm, and yes, the
15k had substantially better seek times, but raw io isn't much
different, so again, it depends on your application's needs.

Lastly, re your question on putting the WAL on the RAID10- I currently
have the box setup as RAID5x6 with the WAL and PGDATA all on the same
raidset. I haven't had the chance to do extensive tests, but from
previous readings, I gather that if you have write-back enabled on the
RAID, it should be ok (which it is in my case).

As to how this compares with an Opteron system, if someone has some
pgbench (or other test) suggestions and a box to compare with, I'd be
happy to run the same on the 2950. (The 2950 is a 2-cpu dual core 3.0
ghz box, 8GB ram with 6 disks, running FreeBSD 6.1 amd64 RELEASE if
you're interested in picking a "fair" opteron equivalent ;)

Thanks,

Bucky




-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jeff Davis
Sent: Tuesday, August 22, 2006 5:34 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PowerEdge 2950 questions

This question is related to the thread:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00152.php
but I had some questions.

I am looking at setting up two general-purpose database servers,
replicated with Slony. Each server I'm looking at has the following
specs:

Dell PowerEdge 2950
- 2 x Dual Core Intel(r) Xeon(r) 5130, 4MB Cache, 2.00GHz, 1333MHZ FSB
- 4GB RAM
- PERC 5/i, x6 Backplane, Integrated Controller Card (256MB battery-
backed cache)
- 6 x 73GB, SAS, 3.5-inch, 15K RPM Hard Drive arranged in RAID 10

These servers are reasonably priced and so they seem like a good choice
for the overall price, and the above thread indicated good performance.
However, I want to make sure that putting WAL in with PGDATA on the
RAID-10 is wise. And if there are any other suggestions that would be
great. Is the RAID controller good? Are the processors good for database
work or are Opterons significantly better?

I may go for more storage as well (i.e. getting 300GB disks), but I am
still determining the potential need for storage. I can get more RAM at
a later date if necessary also.

Regards,
    Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: PowerEdge 2950 questions

From

Jeff Davis

Date:

22 August 2006, 19:23:27

On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> Hi Jeff,
>
> My experience with the 2950 seemed to indicate that RAID10x6 disks did
> not perform as well as RAID5x6. I believe I posted some numbers to
> illustrate this in the post you mentioned.
>

Very interesting. I always hear that people avoid RAID 5 on database
servers, but I suppose it always depends. Is the parity calculation
something that may increase commit latency vs. a RAID 10? That's
normally the explanation that I get.

> If I remember correctly, the numbers were pretty close, but I was
> expecting RAID10 to significantly beat RAID5. However, with 6 disks,
> RAID5 starts performing a little better, and it also has good storage
> utilization (i.e. you're only loosing 1 disk's worth of storage, so with
> 6 drives, you still have 83% - 5/6 - of your storage available, as
> opposed to 50% with RAID10).

Right, RAID 5 is certainly tempting since I get so much more storage.

> Keep in mind that with 6 disks, theoretically (your mileage may vary by
> raid controller implementation) you have more fault tolerance with
> RAID10 than with RAID5.

I'll also have the Slony system, so I think my degree of safety is still
quite high with RAID-5.

> Also, I don't think there's a lot of performance gain to going with the
> 15k drives over the 10k. Even dell only says a 10% boost. I've
> benchmarked a single drive configuration, 10k vs 15k rpm, and yes, the
> 15k had substantially better seek times, but raw io isn't much
> different, so again, it depends on your application's needs.

Do you think the seek time may affect transaction commit time though,
rather than just throughput? Or does it not make much difference since
we have writeback?

> Lastly, re your question on putting the WAL on the RAID10- I currently
> have the box setup as RAID5x6 with the WAL and PGDATA all on the same
> raidset. I haven't had the chance to do extensive tests, but from
> previous readings, I gather that if you have write-back enabled on the
> RAID, it should be ok (which it is in my case).

Ok, I won't worry about that then.

> As to how this compares with an Opteron system, if someone has some
> pgbench (or other test) suggestions and a box to compare with, I'd be
> happy to run the same on the 2950. (The 2950 is a 2-cpu dual core 3.0
> ghz box, 8GB ram with 6 disks, running FreeBSD 6.1 amd64 RELEASE if
> you're interested in picking a "fair" opteron equivalent ;)
>

Based on your results, I think the Intels should be fine. Does each of
the cores have independent access to memory (therefore making memory
access more parallel)?

Thanks very much for the information!

Regards,
    Jeff Davis

Re: PowerEdge 2950 questions

From

"Merlin Moncure"

Date:

24 August 2006, 10:21:47

On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> Very interesting. I always hear that people avoid RAID 5 on database
> servers, but I suppose it always depends. Is the parity calculation
> something that may increase commit latency vs. a RAID 10? That's
> normally the explanation that I get.

it's not the parity, it's the seeking.  Raid 5 gives you great
sequential i/o but random is often not much better than a single
drive.  Actually it's the '1' in raid 10 that plays the biggest role
in optimizing seeks on an ideal raid controller.  Calculating parity
was boring 20 years ago as it inolves one of the fastest operations in
computing, namely xor. :)

> > If I remember correctly, the numbers were pretty close, but I was
> > expecting RAID10 to significantly beat RAID5. However, with 6 disks,
> > RAID5 starts performing a little better, and it also has good storage
> > utilization (i.e. you're only loosing 1 disk's worth of storage, so with
> > 6 drives, you still have 83% - 5/6 - of your storage available, as
> > opposed to 50% with RAID10).

with a 6 disk raid 5, you absolutely have a hot spare in the array.
an alternative is raid 6, which is two parity drives, however there is
not a lot of good data on how raid 6 performs (ideally should be
similar to raid 5). raid 5 is ideal for some things, for example
document storage or in databases where most of the activity takes
place in a small portion of the disks most of the time.

> Right, RAID 5 is certainly tempting since I get so much more storage.
>
> > Keep in mind that with 6 disks, theoretically (your mileage may vary by
> > raid controller implementation) you have more fault tolerance with
> > RAID10 than with RAID5.
>
> I'll also have the Slony system, so I think my degree of safety is still
> quite high with RAID-5.
>
> > Also, I don't think there's a lot of performance gain to going with the
> > 15k drives over the 10k. Even dell only says a 10% boost. I've
> > benchmarked a single drive configuration, 10k vs 15k rpm, and yes, the
> > 15k had substantially better seek times, but raw io isn't much
> > different, so again, it depends on your application's needs.

raw sequential i/o is actually not that important in many databases.
while the database tries to make data transfers sequential as much as
possbile (especially for writing), improved random performance often
translates directly into database performance, especially if your
database is big.

> Do you think the seek time may affect transaction commit time though,
> rather than just throughput? Or does it not make much difference since
> we have writeback?
>
> > Lastly, re your question on putting the WAL on the RAID10- I currently
> > have the box setup as RAID5x6 with the WAL and PGDATA all on the same
> > raidset. I haven't had the chance to do extensive tests, but from
> > previous readings, I gather that if you have write-back enabled on the
> > RAID, it should be ok (which it is in my case).

with 6 relatively small disks I think single raid 10 volume is the
best bet.  however above 6 dedicated wal is usually worth considering.
 since wal storage requirements are so small, it's becoming affordable
to look at solid state for the wal.

merlin

Re: PowerEdge 2950 questions

From

Jeff Davis

Date:

24 August 2006, 13:25:29

On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > Very interesting. I always hear that people avoid RAID 5 on database
> > servers, but I suppose it always depends. Is the parity calculation
> > something that may increase commit latency vs. a RAID 10? That's
> > normally the explanation that I get.
>
> it's not the parity, it's the seeking.  Raid 5 gives you great
> sequential i/o but random is often not much better than a single
> drive.  Actually it's the '1' in raid 10 that plays the biggest role
> in optimizing seeks on an ideal raid controller.  Calculating parity
> was boring 20 years ago as it inolves one of the fastest operations in
> computing, namely xor. :)
>

Here's the explanation I got: If you do a write on RAID 5 to something
that is not in the RAID controllers cache, it needs to do a read first
in order to properly recalculate the parity for the write.

However, I'm sure they try to avoid this by leaving the write in the
battery-backed cache until it's more convenient to do the read, or maybe
until the rest of the stripe is written in which case it doesn't need to
do the read. I am not sure the actual end effect.

> > > Lastly, re your question on putting the WAL on the RAID10- I currently
> > > have the box setup as RAID5x6 with the WAL and PGDATA all on the same
> > > raidset. I haven't had the chance to do extensive tests, but from
> > > previous readings, I gather that if you have write-back enabled on the
> > > RAID, it should be ok (which it is in my case).
>
> with 6 relatively small disks I think single raid 10 volume is the
> best bet.  however above 6 dedicated wal is usually worth considering.
>  since wal storage requirements are so small, it's becoming affordable
> to look at solid state for the wal.
>

I've often wondered about that. To a certain degree, that's the same
effect as just having a bigger battery-backed cache, right?

Regards,
    Jeff Davis

Re: PowerEdge 2950 questions

From

"Merlin Moncure"

Date:

24 August 2006, 15:57:33

On 8/24/06, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> > On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > it's not the parity, it's the seeking.  Raid 5 gives you great
> > sequential i/o but random is often not much better than a single
> > drive.  Actually it's the '1' in raid 10 that plays the biggest role
> > in optimizing seeks on an ideal raid controller.  Calculating parity
> > was boring 20 years ago as it inolves one of the fastest operations in
> > computing, namely xor. :)
>
> Here's the explanation I got: If you do a write on RAID 5 to something
> that is not in the RAID controllers cache, it needs to do a read first
> in order to properly recalculate the parity for the write.

it's worse than that.  if you need to read something that is not in
the o/s cache, all the disks except for one need to be sent to a
physical location in order to get the data.  Thats the basic rule with
striping: it optimizes for sequential i/o in expense of random i/o.
There are some optimizations that can help, but not much.  caching by
the controller can increase performance on writes because it can
optimize the movement across the disks by instituting a delay between
the write request and the actual write.

raid 1 (or 1+x) is the opposite.  It allows the drive heads to move
independantly on reads when combined with some smart algorithms.
writes however must involve all the disk heads however.  Many
controllers do not to seem to optimze raid 1 properly although linux
software raid seems to.

A 4 disk raid 1, for example, could deliver four times the seek
performance which would make it feel much faster than a 4 disk raid 0
under certain conditions.

> > with 6 relatively small disks I think single raid 10 volume is the
> > best bet.  however above 6 dedicated wal is usually worth considering.
> >  since wal storage requirements are so small, it's becoming affordable
> > to look at solid state for the wal.
>
> I've often wondered about that. To a certain degree, that's the same
> effect as just having a bigger battery-backed cache, right?

yeah, if the cache was big enough to cover the volume.  the wal is
also fairly sequenctial i/o though so I'm not sure this would help all
that much after thinking about it.  would be an interesting test
though.

merlin

Re: PowerEdge 2950 questions

From

"Claus Guttesen"

Date:

24 August 2006, 16:27:45

> I am looking at setting up two general-purpose database servers,
> replicated with Slony. Each server I'm looking at has the following
> specs:
>
> Dell PowerEdge 2950
> - 2 x Dual Core Intel(r) Xeon(r) 5130, 4MB Cache, 2.00GHz, 1333MHZ FSB
> - 4GB RAM
> - PERC 5/i, x6 Backplane, Integrated Controller Card (256MB battery-
> backed cache)
> - 6 x 73GB, SAS, 3.5-inch, 15K RPM Hard Drive arranged in RAID 10

Has anyone done any performance-comparison cpu-wise between the above
mentioned cpu and an opteron 270/280?

Alot of attention seems to be spent on the disks and the
raid-controller which is somewhat important by itself, but this has
been covered in numorous threads other places.

regards
Claus

Re: PowerEdge 2950 questions

From

Mark Lewis

Date:

24 August 2006, 16:28:53

> it's worse than that.  if you need to read something that is not in
> the o/s cache, all the disks except for one need to be sent to a
> physical location in order to get the data.  Thats the basic rule with
> striping: it optimizes for sequential i/o in expense of random i/o.
> There are some optimizations that can help, but not much.  caching by
> the controller can increase performance on writes because it can
> optimize the movement across the disks by instituting a delay between
> the write request and the actual write.
>
> raid 1 (or 1+x) is the opposite.  It allows the drive heads to move
> independantly on reads when combined with some smart algorithms.
> writes however must involve all the disk heads however.  Many
> controllers do not to seem to optimze raid 1 properly although linux
> software raid seems to.
>
> A 4 disk raid 1, for example, could deliver four times the seek
> performance which would make it feel much faster than a 4 disk raid 0
> under certain conditions.

I understand random mid-sized seeks (seek to x and read 512k) being slow
on RAID5, but if the read size is small enough not to cross a stripe
boundary, this could be optimized to only one seek on one drive.  Do
most controllers just not do this, or is there some other reason that
I'm not thinking of that would force all disks to seek?

-- Mark

Re: PowerEdge 2950 questions

From

Scott Marlowe

Date:

24 August 2006, 16:38:47

On Thu, 2006-08-24 at 13:57, Merlin Moncure wrote:
> On 8/24/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> > > On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > > it's not the parity, it's the seeking.  Raid 5 gives you great
> > > sequential i/o but random is often not much better than a single
> > > drive.  Actually it's the '1' in raid 10 that plays the biggest role
> > > in optimizing seeks on an ideal raid controller.  Calculating parity
> > > was boring 20 years ago as it inolves one of the fastest operations in
> > > computing, namely xor. :)
> >
> > Here's the explanation I got: If you do a write on RAID 5 to something
> > that is not in the RAID controllers cache, it needs to do a read first
> > in order to properly recalculate the parity for the write.
>
> it's worse than that.  if you need to read something that is not in
> the o/s cache, all the disks except for one need to be sent to a
> physical location in order to get the data.

Ummmm.  No.  Not in my experience.  If you need to read something that's
significantly larger than your stripe size, then yes, you'd need to do
that.  With typical RAID 5 stripe sizes of 64k to 256k, you could read 8
to 32 PostgreSQL 8k blocks from a single disk before having to move the
heads on the next disk to get the next part of data.  A RAID 5, being
read, acts much like a RAID 0 with n-1 disks.

It's the writes that kill performance, since you've got to read two
disks and write two disks for every write, at a minimum.  This is why
small RAID 5 arrays bottleneck so quickly.  a 4 disk RAID 4 with two
writing threads is likely already starting to thrash.

Or did you mean something else by that?

Re: PowerEdge 2950 questions

From

"Bucky Jordan"

Date:

24 August 2006, 16:51:00

Here's benchmarks of RAID5x4 vs RAID10x4 on a Dell Perc5/I with 300 GB
10k RPM SAS drives. I know these are bonnie 1.9 instead of the older
version, but maybe it might still make for useful analysis of RAID5 vs.
RAID10.

Also, unfortunately I don't have the exact numbers, but RAID10x6
performed really poorly on the sequential IO (dd) tests- worse than the
4 disk RAID5, something around 120 MB/s. I'm currently running the
system as a RAID5x6, but would like to go back and do some further
testing if I get the chance to tear the box down again.

These tests were run on FreeBSD 6.1 amd64 RELEASE with UFS + soft
updates. For comparison, the dd for RAID5x6 was 255 MB/s so I think the
extra disks really help out with RAID5 write performance, as Scott
pointed out. (I'm using a 128k stripe size with a 256MB writeback
cache).

Personally, I'm not yet convinced that RAID10 offers dramatically better
performance than RAID5 for 6 disks (at least on the Dell PERC
controller), and available storgae is a significant factor for my
particular application. But I do feel the need to do more testing, so
any suggestions are appreciated. (and yes, I'll be using bonnie 1.03 in
the future, along with pgbench).

------ RAID5x4
# /usr/local/sbin/bonnie++ -d bonnie -s 1000:8k -u root
Version 1.93c       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
     1000M   587  99 158889  30 127859  32  1005  99 824399  99
+++++ +++
Latency             14216us     181ms   48765us   56241us    1687us
47997us
Version 1.93c       ------Sequential Create------ --------Random
Create--------
     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
+++++ +++
Latency             40365us      25us      35us   20030us      36us
52us
1.93c,1.93c,beast.corp.lumeta.com,1,1155204369,1000M,,587,99,158889,30,1
27859,32,1005,99,824399,99,+++++,+++,16,,,,,+++++,+++,+++++,+++,+++++,++
+,+++++,+++,+++++,+++,+++++,+++,14216us,181ms,48765us,56241us,1687us,479
97us,40365us,25us,35us,20030us,36us,52us

# time bash -c "(dd if=/dev/zero of=bigfile count=125000 bs=8k && sync)"
125000+0 records in
125000+0 records out
1024000000 bytes transferred in 6.375067 secs (160625763 bytes/sec)
0.037u 1.669s 0:06.42 26.3%     29+211k 30+7861io 0pf+0w

------ RAID10 x 4
bash-2.05b$ bonnie++ -d bonnie -s 1000:8k
Version 1.93c       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
     1000M   585  99 21705   4 28560   9  1004  99 812997  98  5436
454
Latency             14181us   81364us   50256us   57720us    1671us
1059ms
Version 1.93c       ------Sequential Create------ --------Random
Create--------
     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  4712  10 +++++ +++ +++++ +++  4674  10 +++++ +++
+++++ +++
Latency               807ms      21us      36us     804ms     110us
36us
1.93c,1.93c,beast.corp.lumeta.com,1,1155207445,1000M,,585,99,21705,4,285
60,9,1004,99,812997,98,5436,454,16,,,,,4712,10,+++++,+++,+++++,+++,4674,
10,+++++,+++,+++++,+++,14181us,81364us,50256us,57720us,1671us,1059ms,807
ms,21us,36us,804ms,110us,36us

bash-2.05b$ time bash -c "(dd if=/dev/zero of=bigfile count=125000 bs=8k
&& sync)"
125000+0 records in
125000+0 records out
1024000000 bytes transferred in 45.565848 secs (22472971 bytes/sec)

- Bucky

-----Original Message-----
From: Scott Marlowe [mailto:smarlowe@g2switchworks.com]
Sent: Thursday, August 24, 2006 3:38 PM
To: Merlin Moncure
Cc: Jeff Davis; Bucky Jordan; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PowerEdge 2950 questions

On Thu, 2006-08-24 at 13:57, Merlin Moncure wrote:
> On 8/24/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> > > On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > > it's not the parity, it's the seeking.  Raid 5 gives you great
> > > sequential i/o but random is often not much better than a single
> > > drive.  Actually it's the '1' in raid 10 that plays the biggest
role
> > > in optimizing seeks on an ideal raid controller.  Calculating
parity
> > > was boring 20 years ago as it inolves one of the fastest
operations in
> > > computing, namely xor. :)
> >
> > Here's the explanation I got: If you do a write on RAID 5 to
something
> > that is not in the RAID controllers cache, it needs to do a read
first
> > in order to properly recalculate the parity for the write.
>
> it's worse than that.  if you need to read something that is not in
> the o/s cache, all the disks except for one need to be sent to a
> physical location in order to get the data.

Ummmm.  No.  Not in my experience.  If you need to read something that's
significantly larger than your stripe size, then yes, you'd need to do
that.  With typical RAID 5 stripe sizes of 64k to 256k, you could read 8
to 32 PostgreSQL 8k blocks from a single disk before having to move the
heads on the next disk to get the next part of data.  A RAID 5, being
read, acts much like a RAID 0 with n-1 disks.

It's the writes that kill performance, since you've got to read two
disks and write two disks for every write, at a minimum.  This is why
small RAID 5 arrays bottleneck so quickly.  a 4 disk RAID 4 with two
writing threads is likely already starting to thrash.

Or did you mean something else by that?

Re: PowerEdge 2950 questions

From

"Merlin Moncure"

Date:

24 August 2006, 17:04:04

On 8/24/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Thu, 2006-08-24 at 13:57, Merlin Moncure wrote:
> > On 8/24/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> > > > On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > > > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > > > it's not the parity, it's the seeking.  Raid 5 gives you great
> > > > sequential i/o but random is often not much better than a single
> > > > drive.  Actually it's the '1' in raid 10 that plays the biggest role
> > > > in optimizing seeks on an ideal raid controller.  Calculating parity
> > > > was boring 20 years ago as it inolves one of the fastest operations in
> > > > computing, namely xor. :)
> > >
> > > Here's the explanation I got: If you do a write on RAID 5 to something
> > > that is not in the RAID controllers cache, it needs to do a read first
> > > in order to properly recalculate the parity for the write.
> >
> > it's worse than that.  if you need to read something that is not in
> > the o/s cache, all the disks except for one need to be sent to a
> > physical location in order to get the data.
>
> Ummmm.  No.  Not in my experience.  If you need to read something that's
> significantly larger than your stripe size, then yes, you'd need to do
> that.  With typical RAID 5 stripe sizes of 64k to 256k, you could read 8
> to 32 PostgreSQL 8k blocks from a single disk before having to move the
> heads on the next disk to get the next part of data.  A RAID 5, being
> read, acts much like a RAID 0 with n-1 disks.

i just don't see raid 5 benchmarks backing that up. i know how it is
supposed to work on paper, but all of the raid 5 systems I work with
deliver lousy seek performance.  here is an example from the mysql
folks:
http://peter-zaitsev.livejournal.com/14415.html
and another:
http://storageadvisors.adaptec.com/2005/10/13/raid-5-pining-for-the-fjords/

also, with raid 5 you are squeezed on both ends, too few disks and you
have an efficiency problem. too many disks and you start to get
concerned about mtbf and raid rebuild times.

> It's the writes that kill performance, since you've got to read two
> disks and write two disks for every write, at a minimum.  This is why
> small RAID 5 arrays bottleneck so quickly.  a 4 disk RAID 4 with two
> writing threads is likely already starting to thrash.
>
> Or did you mean something else by that?

well, that's correct, my point was that a 4 disk raid 1 can deliver
more seeks, not necessarily that it is better.  as you say writes
would kill performance. raid 10 seems to be a good compromise.  so is
raid 6 possibly, although i dont see a lot performance data on that.

merlin

Re: PowerEdge 2950 questions

From

Scott Marlowe

Date:

24 August 2006, 17:22:31

On Thu, 2006-08-24 at 15:03, Merlin Moncure wrote:
> On 8/24/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Thu, 2006-08-24 at 13:57, Merlin Moncure wrote:
> > > On 8/24/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > > On Thu, 2006-08-24 at 09:21 -0400, Merlin Moncure wrote:
> > > > > On 8/22/06, Jeff Davis <pgsql@j-davis.com> wrote:
> > > > > > On Tue, 2006-08-22 at 17:56 -0400, Bucky Jordan wrote:
> > > > > it's not the parity, it's the seeking.  Raid 5 gives you great
> > > > > sequential i/o but random is often not much better than a single
> > > > > drive.  Actually it's the '1' in raid 10 that plays the biggest role
> > > > > in optimizing seeks on an ideal raid controller.  Calculating parity
> > > > > was boring 20 years ago as it inolves one of the fastest operations in
> > > > > computing, namely xor. :)
> > > >
> > > > Here's the explanation I got: If you do a write on RAID 5 to something
> > > > that is not in the RAID controllers cache, it needs to do a read first
> > > > in order to properly recalculate the parity for the write.
> > >
> > > it's worse than that.  if you need to read something that is not in
> > > the o/s cache, all the disks except for one need to be sent to a
> > > physical location in order to get the data.
> >
> > Ummmm.  No.  Not in my experience.  If you need to read something that's
> > significantly larger than your stripe size, then yes, you'd need to do
> > that.  With typical RAID 5 stripe sizes of 64k to 256k, you could read 8
> > to 32 PostgreSQL 8k blocks from a single disk before having to move the
> > heads on the next disk to get the next part of data.  A RAID 5, being
> > read, acts much like a RAID 0 with n-1 disks.
>
> i just don't see raid 5 benchmarks backing that up. i know how it is
> supposed to work on paper, but all of the raid 5 systems I work with
> deliver lousy seek performance.  here is an example from the mysql
> folks:
> http://peter-zaitsev.livejournal.com/14415.html
> and another:
> http://storageadvisors.adaptec.com/2005/10/13/raid-5-pining-for-the-fjords/

Well, I've seen VERY good numbers out or RAID 5 arrays.  As long as I
wasn't writing to them.  :)

Trust me though, I'm no huge fan of RAID 5.

> > It's the writes that kill performance, since you've got to read two
> > disks and write two disks for every write, at a minimum.  This is why
> > small RAID 5 arrays bottleneck so quickly.  a 4 disk RAID 4 with two
> > writing threads is likely already starting to thrash.
> >
> > Or did you mean something else by that?
>
> well, that's correct, my point was that a 4 disk raid 1 can deliver
> more seeks, not necessarily that it is better.  as you say writes
> would kill performance. raid 10 seems to be a good compromise.  so is
> raid 6 possibly, although i dont see a lot performance data on that.

Yeah, I think RAID 10, in this modern day of large, inexpensive hard
drives, is the way to go for most transactional / heavily written
systems.

I'm not sure RAID-6 is worth the effort.  For smaller arrays (4 to 6),
you've got about as many "extra" drives as in RAID 1+0.  And that old
read twice write twice penalty becomes read twice (or is that thrice???)
and write thrice.  So, you'd chew up your iface bandwidth quicker.
Although in SAS / SATA I guess that part's not a big deal, the data has
to be moved around somewhere on the card / in the controller chips, so
it's still a problem somewhere waiting to happen in terms of bandwidth.

Re: PowerEdge 2950 questions

From

"Merlin Moncure"

Date:

24 August 2006, 17:29:30

On 8/24/06, Bucky Jordan <bjordan@lumeta.com> wrote:
> Here's benchmarks of RAID5x4 vs RAID10x4 on a Dell Perc5/I with 300 GB
> 10k RPM SAS drives. I know these are bonnie 1.9 instead of the older
> version, but maybe it might still make for useful analysis of RAID5 vs.
> RAID10.

> ------ RAID5x4
i dont see the seeks here, am i missing something?

[raid 10 dd]
> 1024000000 bytes transferred in 45.565848 secs (22472971 bytes/sec)

ouch. this is a problem with the controller.  it should be higher than
this but the raid 5 should edge it out regardless.  try configuring
the hardware as a jbod and doing the raid 10 in software.

merlin