Home > mailing lists

Re: H800 + md1200 Performance problem - Mailing list pgsql-performance

From	Cesar Martin
Subject	Re: H800 + md1200 Performance problem
Date	April 4, 2012 06:42:38
Msg-id	CAMAsR=7onjeWr--PtgHgfZv=yYSB8FVxf1BsYSwu2752YY0Q8w@mail.gmail.com Whole thread
In response to	Re: H800 + md1200 Performance problem (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: H800 + md1200 Performance problem Re: H800 + md1200 Performance problem
List	pgsql-performance

Tree view

Hello,

Yesterday I changed the kernel setting, that said Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have noticed changes at least in Postgres:

First exec:

EXPLAIN ANALYZE SELECT * from company_news_internet_201111;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on company_news_internet_201111 (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)

Total runtime: 12699.008 ms

(2 filas)

Second:

EXPLAIN ANALYZE SELECT * from company_news_internet_201111;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on company_news_internet_201111 (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)

Total runtime: 2696.901 ms

It seems that now data is being cached right...

The large query in first exec takes 80 seconds and in second exec takes around 23 seconds. This is not spectacular but is better than yesterday.

Furthermore the results of dd are strange:

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384

16384+0 records in

16384+0 records out

137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s

171 MB/s I think is bad value for 12 SAS RAID10... And when I execute iostat during the dd execution i obtain results like:

sdc 1514,62 0,01 108,58 11 117765

sdc 3705,50 0,01 316,62 0 633

sdc 2,00 0,00 0,05 0 0

sdc 920,00 0,00 63,49 0 126

sdc 8322,50 0,03 712,00 0 1424

sdc 6662,50 0,02 568,53 0 1137

sdc 0,00 0,00 0,00 0 0

sdc 1,50 0,00 0,04 0 0

sdc 6413,00 0,01 412,28 0 824

sdc 13107,50 0,03 867,94 0 1735

sdc 0,00 0,00 0,00 0 0

sdc 1,50 0,00 0,03 0 0

sdc 9719,00 0,03 815,49 0 1630

sdc 2817,50 0,01 272,51 0 545

sdc 1,50 0,00 0,05 0 0

sdc 1181,00 0,00 71,49 0 142

sdc 7225,00 0,01 362,56 0 725

sdc 2973,50 0,01 269,97 0 539

I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly during execution.

Read results:

dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384

16384+0 records in

16384+0 records out

137438953472 bytes (137 GB) copied, 257,626 s, 533 MB/s

sdc 3157,00 392,69 0,00 785 0

sdc 3481,00 432,75 0,00 865 0

sdc 2669,50 331,50 0,00 663 0

sdc 3725,50 463,75 0,00 927 0

sdc 2998,50 372,38 0,00 744 0

sdc 3600,50 448,00 0,00 896 0

sdc 3588,00 446,50 0,00 893 0

sdc 3494,00 434,50 0,00 869 0

sdc 3141,50 390,62 0,00 781 0

sdc 3667,50 456,62 0,00 913 0

sdc 3429,35 426,18 0,00 856 0

sdc 3043,50 378,06 0,00 756 0

sdc 3366,00 417,94 0,00 835 0

sdc 3480,50 432,62 0,00 865 0

sdc 3523,50 438,06 0,00 876 0

sdc 3554,50 441,88 0,00 883 0

sdc 3635,00 452,19 0,00 904 0

sdc 3107,00 386,20 0,00 772 0

sdc 3695,00 460,00 0,00 920 0

sdc 3475,50 432,11 0,00 864 0

sdc 3487,50 433,50 0,00 867 0

sdc 3232,50 402,39 0,00 804 0

sdc 3698,00 460,67 0,00 921 0

sdc 5059,50 632,00 0,00 1264 0

sdc 3934,00 489,56 0,00 979 0

sdc 4536,50 566,75 0,00 1133 0

sdc 5298,00 662,12 0,00 1324 0

Here results I think that are more logical. Read speed is maintained along all the test...

About the parameter "conv=fdatasync" that mention Tomas, I saw it at http://romanrm.ru/en/dd-benchmark and I started to use but is possible wrong. Before I used time sh -c "dd if=/dev/zero of=ddfile bs=X count=Y && sync".

What is your opinion about the results??

I have noticed that since I changed the setting vm.zone_reclaim_mode = 0, swap is totally full. Do you recommend me disable swap?

Thanks!!

El 3 de abril de 2012 20:01, Tomas Vondra <tv@fuzzy.cz> escribió:

On 3.4.2012 17:42, Cesar Martin wrote:
> Yes, setting is the same in both machines.
>
> The results of bonnie++ running without arguments are:
>
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> cltbbdd01 126G 94 99 202873 99 208327 95 1639 91 819392 88
> 2131 139
> Latency 88144us 228ms 338ms 171ms 147ms
> 20325us
> ------Sequential Create------ --------Random
> Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> cltbbdd01 16 8063 26 +++++ +++ 27361 96 31437 96 +++++ +++
> +++++ +++
> Latency 7850us 2290us 2310us 530us 11us
> 522us
>
> With DD, one core of CPU put at 100% and results are about 100-170
> MBps, that I thing is bad result for this HW:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
> 100+0 records in
> 100+0 records out
> 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
> 1000+0 records in
> 1000+0 records out
> 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
>
> When monitor I/O activity with iostat, during dd, I have noticed that,
> if the test takes 10 second, the disk have activity only during last 3
> or 4 seconds and iostat report about 250-350MBps. Is it normal?

Well, you're testing writing, and the default behavior is to write the
data into page cache. And you do have 64GB of RAM so the write cache may
take large portion of the RAM - even gigabytes. To really test the I/O
you need to (a) write about 2x the amount of RAM or (b) tune the
dirty_ratio/dirty_background_ratio accordingly.

BTW what are you trying to achieve with "conv=fdatasync" at the end. My
dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
side. If you want to sync the data at the end, then you need to do
something like

time sh -c "dd ... && sync"

> I set read ahead to different values, but the results don't differ
> substantially...

Because read-ahead is for reading (which is what a SELECT does most of
the time), but the dests above are writing to the device. And writing is
not influenced by read-ahead.

To test reading, do this:

dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=1024

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
César Martín Pérez
cmartinp@gmail.com

pgsql-performance by date:

From: David Kerr
Date: 03 April 2012, 22:51:50
Subject: Re: pg_autovacuum in PG9.x

From: Scott Marlowe
Date: 04 April 2012, 10:15:46
Subject: Re: H800 + md1200 Performance problem

Re: H800 + md1200 Performance problem - Mailing list pgsql-performance

Previous

Next