Re: RAID stripe size question - Mailing list pgsql-performance
From | Alex Turner |
---|---|
Subject | Re: RAID stripe size question |
Date | |
Msg-id | 33c6269f0607170823v7df531a8o678505125e85880@mail.gmail.com Whole thread Raw |
In response to | Re: RAID stripe size question ("Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com>) |
List | pgsql-performance |
On 7/17/06, Mikael Carneholm <Mikael.Carneholm@wirelesscar.com> wrote:
These bonnie++ number are very worrying. Your controller should easily max out your FC interface on these tests passing 192MB/sec with ease on anything more than an 6 drive RAID 10 . This is a bad omen if you want high performance... Each mirror pair can do 60-80MB/sec. A 24Disk RAID 10 can do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not unreachable, but time and again, we see these bad perf numbers from FC and SCSI systems alike. Consider a different controller, because this one is not up to snuff. A single drive would get better numbers than your 4 disk RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to 120Mb/sec. If you can't swap out, software RAID may turn out to be your friend. The only saving grace is that this is OLTP, and perhaps, just maybe, the controller will be better at ordering IOs, but I highly doubt it.
Please people, do the numbers, benchmark before you buy, many many HBAs really suck under Linux/Free BSD, and you may end up paying vast sums of money for very sub-optimal performance (I'd say sub-standard, but alas, it seems that this kind of poor performance is tolerated, even though it's way off where it should be). There's no point having a 40disk cab, if your controller can't handle it.
Maximum theoretical linear throughput can be acheived in a White Box for under $20k, and I have seen this kind of system outperform a server 5 times it's price even in OLTP.
Alex>> This is something I'd also would like to test, as a common
>> best-practice these days is to go for a SAME (stripe all, mirror
everything) setup.
>> From a development perspective it's easier to use SAME as the
>> developers won't have to think about physical location for new
>> tables/indices, so if there's no performance penalty with SAME I'll
>> gladly keep it that way.
>Usually, it's not the developers task to care about that, but the DBAs
responsibility.
As we don't have a full-time dedicated DBA (although I'm the one who do
most DBA related tasks) I would aim for making physical location as
transparent as possible, otherwise I'm afraid I won't be doing anything
else than supporting developers with that - and I *do* have other things
to do as well :)
>> In a previous test, using cd=5000 and cs=20 increased transaction
>> throughput by ~20% so I'll definitely fiddle with that in the coming
>> tests as well.
>How many parallel transactions do you have?
That was when running BenchmarkSQL
(http://sourceforge.net/projects/benchmarksql ) with 100 concurrent users
("terminals"), which I assume means 100 parallel transactions at most.
The target application for this DB has 3-4 times as many concurrent
connections so it's possible that one would have to find other cs/cd
numbers better suited for that scenario. Tweaking bgwriter is another
task I'll look into as well..
Btw, here's the bonnie++ results from two different array sets (10+18,
4+24) on the MSA1500:
LUN: WAL, 10 disks, stripe size 32K
------------------------------------
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01 32G 56139 93 73250 22 16530 3 30488 45 57489 5
477.3 1
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 2458 90 +++++ +++ +++++ +++ 3121 99 +++++ +++
10469 98
LUN: WAL, 4 disks, stripe size 8K
----------------------------------
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01 32G 49170 82 60108 19 13325 2 15778 24 21489 2
266.4 0
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 2432 86 +++++ +++ +++++ +++ 3106 99 +++++ +++
10248 98
LUN: DATA, 18 disks, stripe size 32K
-------------------------------------
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01 32G 59990 97 87341 28 19158 4 30200 46 57556 6
495.4 1
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 1640 92 +++++ +++ +++++ +++ 1736 99 +++++ +++
10919 99
LUN: DATA, 24 disks, stripe size 64K
-------------------------------------
Version 1.03 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
sesell01 32G 59443 97 118515 39 25023 5 30926 49 60835 6
531.8 1
------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 2499 90 +++++ +++ +++++ +++ 2817 99 +++++ +++
10971 100
These bonnie++ number are very worrying. Your controller should easily max out your FC interface on these tests passing 192MB/sec with ease on anything more than an 6 drive RAID 10 . This is a bad omen if you want high performance... Each mirror pair can do 60-80MB/sec. A 24Disk RAID 10 can do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not unreachable, but time and again, we see these bad perf numbers from FC and SCSI systems alike. Consider a different controller, because this one is not up to snuff. A single drive would get better numbers than your 4 disk RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to 120Mb/sec. If you can't swap out, software RAID may turn out to be your friend. The only saving grace is that this is OLTP, and perhaps, just maybe, the controller will be better at ordering IOs, but I highly doubt it.
Please people, do the numbers, benchmark before you buy, many many HBAs really suck under Linux/Free BSD, and you may end up paying vast sums of money for very sub-optimal performance (I'd say sub-standard, but alas, it seems that this kind of poor performance is tolerated, even though it's way off where it should be). There's no point having a 40disk cab, if your controller can't handle it.
Maximum theoretical linear throughput can be acheived in a White Box for under $20k, and I have seen this kind of system outperform a server 5 times it's price even in OLTP.
pgsql-performance by date: