Re: H800 + md1200 Performance problem - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: H800 + md1200 Performance problem
Date
Msg-id 4F7CA614.8060008@fuzzy.cz
Whole thread Raw
In response to Re: H800 + md1200 Performance problem  (Cesar Martin <cmartinp@gmail.com>)
Responses Re: H800 + md1200 Performance problem  (Glyn Astill <glynastill@yahoo.co.uk>)
List pgsql-performance
On 4.4.2012 20:46, Cesar Martin wrote:
> Raid controller issue or driver problem was the first problem that I
> studied.
> I installed Centos 5.4 al the beginning, but I had performance problems,
> and I contacted Dell support... but Centos is not support by Dell...
> Then I installed Redhat 6 and we contact Dell with same problem.
> Dell say that all is right and that this is a software problem.
> I have installed Centos 5.4, 6.2 and Redhat 6 with similar result, I
> think that not is driver problem (megasas-raid kernel module).
> I will check kernel updates...
> Thanks!

Well, there are different meanings of 'working'. Obviously you mean
'gives reasonable performance' while Dell understands 'is not on fire'.

IIRC H800 is just a 926x controller from LSI, so it's probably based on
LSI 2108. Can you post basic info about the setting, i.e.

  MegaCli -AdpAllInfo -aALL

or something like that? I'm especially interested in the access/cache
policies, cache drop interval .etc, i.e.

  MegaCli -LDGetProp (-Cache | -Access | -Name | -DskCache)

What I'd do next is testing a much smaller array (even a single drive)
to see if the issue exists. If it works, try to add another drive etc.
It's much easier to show them something's wrong. The simpler the test
case, the better.

I've found this (it's about a 2108-based controller from LSI):

http://www.xbitlabs.com/articles/storage/display/lsi-megaraid-sas9260-8i_3.html#sect0

The paragraphs below the diagram are interesting. Not sure if they
describe the same issue you have, but maybe it's related.

Anyway, it's quite usual that a RAID controller has about 50% write
performance compared to read performance, usually due to on-board CPU
bottleneck. You do have ~ 530 MB/s and 170 MB/s, so it's not exactly 50%
but it's not very far.

But the fluctuation, that surely is strange. What are the page cache
dirty limits, i.e.

cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_ratio

That's probably #1 source I've seen responsible for such issues (on
machines with a lot of RAM).

Tomas


pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: H800 + md1200 Performance problem
Next
From: Ofer Israeli
Date:
Subject: Re: TCP Overhead on Local Loopback