Re: H800 + md1200 Performance problem - Mailing list pgsql-performance

From Cesar Martin
Subject Re: H800 + md1200 Performance problem
Date
Msg-id CAMAsR=7onjeWr--PtgHgfZv=yYSB8FVxf1BsYSwu2752YY0Q8w@mail.gmail.com
Whole thread Raw
In response to Re: H800 + md1200 Performance problem  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: H800 + md1200 Performance problem  (Scott Marlowe <scott.marlowe@gmail.com>)
Re: H800 + md1200 Performance problem  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
Hello,

Yesterday I changed the kernel setting, that said Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have noticed changes at least in Postgres:

First exec:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)
 Total runtime: 12699.008 ms
(2 filas)

Second:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on company_news_internet_201111  (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)
 Total runtime: 2696.901 ms

It seems that now data is being cached right...

The large query in first exec takes 80 seconds and in second exec takes around 23 seconds. This is not spectacular but is better than yesterday.

Furthermore the results of dd are strange:

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s

171 MB/s I think is bad value for 12 SAS RAID10... And when I execute iostat during the dd execution i obtain results like:
sdc            1514,62         0,01       108,58         11     117765
sdc            3705,50         0,01       316,62          0        633
sdc               2,00         0,00         0,05          0          0
sdc             920,00         0,00        63,49          0        126
sdc            8322,50         0,03       712,00          0       1424
sdc            6662,50         0,02       568,53          0       1137
sdc               0,00         0,00         0,00          0          0
sdc               1,50         0,00         0,04          0          0
sdc            6413,00         0,01       412,28          0        824
sdc           13107,50         0,03       867,94          0       1735
sdc               0,00         0,00         0,00          0          0
sdc               1,50         0,00         0,03          0          0
sdc            9719,00         0,03       815,49          0       1630
sdc            2817,50         0,01       272,51          0        545
sdc               1,50         0,00         0,05          0          0
sdc            1181,00         0,00        71,49          0        142
sdc            7225,00         0,01       362,56          0        725
sdc            2973,50         0,01       269,97          0        539

I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly during execution.

Read results:

dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 257,626 s, 533 MB/s

sdc            3157,00       392,69         0,00        785          0
sdc            3481,00       432,75         0,00        865          0
sdc            2669,50       331,50         0,00        663          0
sdc            3725,50       463,75         0,00        927          0
sdc            2998,50       372,38         0,00        744          0
sdc            3600,50       448,00         0,00        896          0
sdc            3588,00       446,50         0,00        893          0
sdc            3494,00       434,50         0,00        869          0
sdc            3141,50       390,62         0,00        781          0
sdc            3667,50       456,62         0,00        913          0
sdc            3429,35       426,18         0,00        856          0
sdc            3043,50       378,06         0,00        756          0
sdc            3366,00       417,94         0,00        835          0
sdc            3480,50       432,62         0,00        865          0
sdc            3523,50       438,06         0,00        876          0
sdc            3554,50       441,88         0,00        883          0
sdc            3635,00       452,19         0,00        904          0
sdc            3107,00       386,20         0,00        772          0
sdc            3695,00       460,00         0,00        920          0
sdc            3475,50       432,11         0,00        864          0
sdc            3487,50       433,50         0,00        867          0
sdc            3232,50       402,39         0,00        804          0
sdc            3698,00       460,67         0,00        921          0
sdc            5059,50       632,00         0,00       1264          0
sdc            3934,00       489,56         0,00        979          0
sdc            4536,50       566,75         0,00       1133          0
sdc            5298,00       662,12         0,00       1324          0

Here results I think that are more logical. Read speed is maintained along all the test...

About the parameter "conv=fdatasync" that mention Tomas, I saw it at http://romanrm.ru/en/dd-benchmark and I started to use but is possible wrong. Before I used time sh -c "dd if=/dev/zero of=ddfile bs=X count=Y && sync".

What is your opinion about the results??

I have noticed that since I changed the setting  vm.zone_reclaim_mode = 0, swap is totally full. Do you recommend me disable swap?

Thanks!!

El 3 de abril de 2012 20:01, Tomas Vondra <tv@fuzzy.cz> escribió:
On 3.4.2012 17:42, Cesar Martin wrote:
> Yes, setting is the same in both machines.
>
> The results of bonnie++ running without arguments are:
>
> Version      1.96   ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>  /sec %CP
> cltbbdd01      126G    94  99 202873  99 208327  95  1639  91 819392  88
>  2131 139
> Latency             88144us     228ms     338ms     171ms     147ms
> 20325us
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>  /sec %CP
> cltbbdd01        16  8063  26 +++++ +++ 27361  96 31437  96 +++++ +++
> +++++ +++
> Latency              7850us    2290us    2310us     530us      11us
> 522us
>
> With DD, one core of CPU put at 100% and results are  about 100-170
> MBps, that I thing is bad result for this HW:
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
> 100+0 records in
> 100+0 records out
> 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
> 1000+0 records in
> 1000+0 records out
> 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
>
> dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
>
> When monitor I/O activity with iostat, during dd, I have noticed that,
> if the test takes 10 second, the disk have activity only during last 3
> or 4 seconds and iostat report about 250-350MBps. Is it normal?

Well, you're testing writing, and the default behavior is to write the
data into page cache. And you do have 64GB of RAM so the write cache may
take large portion of the RAM - even gigabytes. To really test the I/O
you need to (a) write about 2x the amount of RAM or (b) tune the
dirty_ratio/dirty_background_ratio accordingly.

BTW what are you trying to achieve with "conv=fdatasync" at the end. My
dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
side. If you want to sync the data at the end, then you need to do
something like

  time sh -c "dd ... && sync"

> I set read ahead to different values, but the results don't differ
> substantially...

Because read-ahead is for reading (which is what a SELECT does most of
the time), but the dests above are writing to the device. And writing is
not influenced by read-ahead.

To test reading, do this:

  dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=1024

Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
César Martín Pérez
cmartinp@gmail.com

pgsql-performance by date:

Previous
From: David Kerr
Date:
Subject: Re: pg_autovacuum in PG9.x
Next
From: Scott Marlowe
Date:
Subject: Re: H800 + md1200 Performance problem