Re: RAID stripe size question - Mailing list pgsql-performance

From Alex Turner
Subject Re: RAID stripe size question
Date
Msg-id 33c6269f0607170823v7df531a8o678505125e85880@mail.gmail.com
Whole thread Raw
In response to Re: RAID stripe size question  ("Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com>)
List pgsql-performance


On 7/17/06, Mikael Carneholm <Mikael.Carneholm@wirelesscar.com> wrote:
>> This is something I'd also would like to test, as a common
>> best-practice these days is to go for a SAME (stripe all, mirror
everything) setup.
>> From a development perspective it's easier to use SAME as the
>> developers won't have to think about physical location for new
>> tables/indices, so if there's no performance penalty with SAME I'll
>> gladly keep it that way.

>Usually, it's not the developers task to care about that, but the DBAs
responsibility.

As we don't have a full-time dedicated DBA (although I'm the one who do
most DBA related tasks) I would aim for making physical location as
transparent as possible, otherwise I'm afraid I won't be doing anything
else than supporting developers with that - and I *do* have other things
to do as well :)

>> In a previous test, using cd=5000 and cs=20 increased transaction
>> throughput by ~20% so I'll definitely fiddle with that in the coming
>> tests as well.

>How many parallel transactions do you have?

That was when running BenchmarkSQL
(http://sourceforge.net/projects/benchmarksql ) with 100 concurrent users
("terminals"), which I assume means 100 parallel transactions at most.
The target application for this DB has 3-4 times as many concurrent
connections so it's possible that one would have to find other cs/cd
numbers better suited for that scenario. Tweaking bgwriter is another
task I'll look into as well..

Btw, here's the bonnie++ results from two different array sets (10+18,
4+24) on the MSA1500:

LUN: WAL, 10 disks, stripe size 32K
------------------------------------
Version  1.03       ------Sequential Output------ --Sequential Input-
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01        32G 56139  93 73250  22 16530   3 30488  45 57489   5
477.3   1
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  2458  90 +++++ +++ +++++ +++  3121  99 +++++ +++
10469  98


LUN: WAL, 4 disks, stripe size 8K
----------------------------------
Version  1.03       ------Sequential Output------ --Sequential Input-
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01        32G 49170  82 60108  19 13325   2 15778  24 21489   2
266.4   0
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  2432  86 +++++ +++ +++++ +++  3106  99 +++++ +++
10248  98


LUN: DATA, 18 disks, stripe size 32K
-------------------------------------
Version  1.03        ------Sequential Output------ --Sequential Input-
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01        32G 59990  97 87341  28 19158   4 30200  46 57556   6
495.4   1
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  1640  92 +++++ +++ +++++ +++  1736  99 +++++ +++
10919  99


LUN: DATA, 24 disks, stripe size 64K
-------------------------------------
Version  1.03        ------Sequential Output------ --Sequential Input-
--Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01        32G 59443  97 118515  39 25023   5 30926  49 60835   6
531.8   1
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  2499  90 +++++ +++ +++++ +++  2817  99 +++++ +++
10971 100


These bonnie++ number are very worrying.  Your controller should easily max out your FC interface on these tests passing 192MB/sec with ease on anything more than an 6 drive RAID 10 .  This is a bad omen if you want high performance...  Each mirror pair can do 60-80MB/sec.  A 24Disk RAID 10 can do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not unreachable, but time and again, we see these bad perf numbers from FC and SCSI systems alike.  Consider a different controller, because this one is not up to snuff.  A single drive would get better numbers than your 4 disk RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to 120Mb/sec.  If you can't swap out, software RAID may turn out to be your friend.  The only saving grace is that this is OLTP, and perhaps, just maybe, the controller will be better at ordering IOs, but I highly doubt it.

Please people, do the numbers, benchmark before you buy, many many HBAs really suck under Linux/Free BSD, and you may end up paying vast sums of money for very sub-optimal performance (I'd say sub-standard, but alas, it seems that this kind of poor performance is tolerated, even though it's way off where it should be).  There's no point having a 40disk cab, if your controller can't handle it.

Maximum theoretical linear throughput can be acheived in a White Box for under $20k, and I have seen this kind of system outperform a server 5 times it's price even in OLTP.

Alex

pgsql-performance by date:

Previous
From: "Steinar H. Gunderson"
Date:
Subject: Re: RAID stripe size question
Next
From: "Chris Hoover"
Date:
Subject: Re: Big differences in plans between 8.0 and 8.1