Re: Performances issues with SSD volume ? - Mailing list pgsql-admin

From Thomas SIMON
Subject Re: Performances issues with SSD volume ?
Date
Msg-id 556486BE.8060005@neteven.com
Whole thread Raw
In response to Re: Performances issues with SSD volume ?  (Glyn Astill <glynastill@yahoo.co.uk>)
Responses Re: Performances issues with SSD volume ?
List pgsql-admin
Thomas

Le 22/05/2015 18:21, Glyn Astill a écrit :
> ----- Original Message -----
>
>> From: Thomas SIMON<tsimon@neteven.com>
>> To: Glyn Astill<glynastill@yahoo.co.uk>
>> Cc:"pgsql-admin@postgresql.org"  <pgsql-admin@postgresql.org>
>> Sent: Friday, 22 May 2015, 16:29
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>> Thomas
>>
>> Le 22/05/2015 11:37, Glyn Astill a écrit :
>>>>   From: Thomas SIMON<tsimon@neteven.com>
>>>>   To: Glyn Astill<glynastill@yahoo.co.uk>
>>>>   Cc:"pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>>>   Sent: Thursday, 21 May 2015, 17:56
>>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>>
>>>>   Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>>>       I think at this point you could do with going back and
>> trying to
>>>>   reproduce
>>>>>>>     the issue, then trace back up to pg_stat_activity to see
>> what
>>>>   activity could be
>>>>>>>     causing the disk i/o.  I assume you've tried to
>> reproduce the
>>>>   disk issues
>>>>>>>     with a simple disk benchmark like bonnie++?
>>>>>>     Yes, I think the same thing. Probably I will doing this
>> tomorrow early
>>>>>>     in the morning.
>>>>>>     I tried to reproduce disk issues with different stress tests
>> like
>>>>>>     bonnie, fio, tsung, and I use a more realistic scenario with
>> pgreplay
>>>>   to
>>>>>>     reproduce my production trafic from postgresql logfile.
>>>>>>     However, I'm note sure how to diagnostic performance
>> issues.
>>>>>>     I mean, if I see ssd are 100% full, how can I figure out why
>> their
>>>>>>     behavior changes ?
>>>>>>
>>>>>     Well the disk benchmarks are purely to see what your disks are
>> capable of,
>>>>   and help with your initial tuning.
>>>>>     You need to trace back which processes are causing most of the IO
>>>>   you're seeing, as well as the postgresql logs something like iotop,
>> or dstat
>>>>   with the --top-bio option might help you there.
>>>>>     You could also look at the pg_statio_user_tables view to narrow
>> down which
>>>>   tables are being hit the hardest, which might give you some clues.
>>>>   Is there something to activate for seeing something in this table ?
>>>>   Because its empty on my production server
>>>>
>>>>   postgres=# select * from pg_statio_user_tables;
>>>>      relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>>>>   idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>>>>   tidx_blks_read | tidx_blks_hit
>>>>
>>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>>>>   (0 rows)
>>>>
>>>   Looks like you need to set track_counts=on then.  Infact if you've got
>> track_counts off then you're also not running autovacuum, that's a
>> warning flag unless it's intentional.
>> Arf, no it's just me who does dummy things ...
>> I was in postgres database, not in my app...
>>
>>
>> Here is result of my top tables.
>>
>> relid   | schemaname | relname                        | heap_blks_read |
>> heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read |
>> toast_blks_hit | tidx_blks_read | tidx_blks_hit
>>
----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
>>     2056411 | public     | table 1
>> |   326588572770 | 432317896616 |   12839634494 |   33230958888 |
>> 52298815 |       20020628 |        9406665 |     153557966
>>     2059324 | public     | table 2                               |
>> 223323685944 |   72194867978 |   10319078550 |  103672304590 |
>> 189135 |         262100 |          27815 |        207212
>>     2056368 | public     | table 3
>>                                                  |   198706578066 |
>> 1259514828344 |   21083719219 | 1456776226532 |           90446
>> |         331680 |          30850 |        367441
>>     2056347 | public     | table 4 |   124086575669 |  554775474730 |
>> 5236764894 |  614913194098 |        14978587 |       67085822 |
>> 751951 |     134187396
>>     2057296 | public     | table 5
>> |    68120702371 | 18134568265 |    1243107028 |   10537765277 |
>> 350646003 |       87888306 |      119113944 |     629790819
>>     2058460 | public     | table 6 |    35119995405 |  158177258403 |
>> 1687014495 | 1395923096049 |          738542 |         214574 |
>> 325765 |       1318187
>>     2058900 | public     | table 7
>> |    34205345184 | 271135546976 |   21050227933 |  252099340046
>> |            3924 |           6567 |            422 |         16333
>>     2059195 | public     | table 8
>>                                                |    29169256266 |
>> 152937546457 |    2263998316 |  127782237565 |               0
>> |              0 |              0 |             0
>>
>> how you interpret these values?
>> You check ratio between  heap_read & heap_hit ?
>> i've made ratio, it's about 89%
> It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared
buffersas "hit". 
>
>
> See here:http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW
>
>
> Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o
performance(or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are
thereimprovements to be had with your database and application setup. 
>
> For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else
isgoing off on them.  If you can't do that on your production server perhaps just the output from sar -d during a busy
periodmight help.  Hopefully that would help to clarify the specific disk performance differences for us.  Other
interestingtests would be the output of Greg Smiths stream scaling script
(https://github.com/gregs1104/stream-scaling). We're still missing the exact specs of your old system and the CPUs in
thenew one. 
I'll stay in this 1st point for now, because there is origin of my
problem and most urgent right now.

I can't do bonnie++ now, beacause I've already prepared server as a
slave, and I have no sufficient disk space available (program says File
size should be double RAM for good results output)

My current sar output on productions (HDD) server is:
14:25:01          DEV                    tps  rd_sec/s  wr_sec/s
avgrq-sz  avgqu-sz     await     svctm     %util

11:05:01    vg_data-lv_data    954.37   2417.35  18941.77 22.38
31.10     32.46      0.34     32.70
11:15:01    vg_data-lv_data   1155.79   8716.91  21995.77 26.57
25.15     21.74      0.40     46.70
11:25:01    vg_data-lv_data   1250.62   6478.67  23450.07 23.93
39.77     31.78      0.41     51.34
11:35:01    vg_data-lv_data    842.48   2051.11  17120.92 22.76
15.63     18.53      0.29     24.04
11:45:01    vg_data-lv_data    666.21   1403.32  14174.47 23.38
10.11     15.12      0.24     15.79
11:55:01    vg_data-lv_data    923.51   6763.36  15337.58 23.93
13.07     14.14      0.35     32.63
12:05:01    vg_data-lv_data    989.86   9148.71  16252.59 25.66
19.42     19.56      0.45     44.21
12:15:01    vg_data-lv_data   1369.24   8631.93  24737.60 24.37
35.04     25.54      0.45     61.33
12:25:01    vg_data-lv_data   1776.12   7070.01  39851.34 26.42
74.81     42.05      0.44     77.29
12:35:01    vg_data-lv_data   1529.15   6635.80  85865.14 60.49
54.11     35.34      0.48     72.89
12:45:01    vg_data-lv_data   1187.43   4528.74  40366.95 37.81
36.07     30.36      0.39     45.81
12:55:01    vg_data-lv_data    984.48   3520.06  21539.36 25.45
17.91     18.17      0.31     30.20
13:05:01    vg_data-lv_data    926.54   6304.44  16688.94 24.82
17.36     18.69      0.41     38.05
13:15:01    vg_data-lv_data   1232.46   7199.65  29852.49 30.06
40.17     32.53      0.42     51.60
13:25:01    vg_data-lv_data   1223.46   3945.05  27448.15 25.66
31.07     25.31      0.35     42.65
13:35:01    vg_data-lv_data   1126.91   2811.70  22067.19 22.08
24.33     21.55      0.32     36.00
13:45:01    vg_data-lv_data    833.33   1805.26  17274.43 22.90
24.40     29.25      0.30     25.41
13:55:02    vg_data-lv_data   1085.88   7616.75  19140.67 24.64
17.48     16.06      0.39     42.15
14:05:01    vg_data-lv_data    691.52   3852.50  13125.53 24.55
7.75     11.15      0.30     20.74
14:15:01    vg_data-lv_data   1288.88   5390.41  24171.07 22.94
33.31     25.76      0.36     46.31
14:25:01    vg_data-lv_data   1592.88   3637.77  29836.89 21.02
76.45     47.94      0.40     63.28
14:35:01    vg_data-lv_data   1652.78   9502.87  31587.68 24.86
58.97     35.58      0.44     72.46
14:45:01    vg_data-lv_data   1623.82   6249.52  34148.46 24.88
53.47     32.83      0.40     65.19
14:55:01    vg_data-lv_data   1330.44   6516.11  26828.59 25.06
55.66     41.81      0.42     55.46
Average:    vg_data-lv_data   1176.55   5508.02  26324.37 27.06
33.86     28.72      0.39     45.59

Output of stream-scaling is
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 25600K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 25600K (Unified)
... (for 40 cpus)


Total CPU system cache: 1060372480 bytes
Suggested minimum array elements needed: 481987490
WARNING: Array size may not fit into a 32 bit structure.
If stream files to compile, you may need to uncomment the
line in the script labeled and described by the "Size
clamp code" comments in the stream-scaling script.
Array elements used: 481987490

=== CPU Core Summary ===
processor    : 39
model name    : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
cpu MHz        : 1273.925
siblings    : 20

=== Check and build stream ===
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Triad._omp_fn.0':
stream.c:(.text+0x5a): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x8b): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x98): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe0): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe7): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x14b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x15b): relocation truncated to fit: R_X86_64_32S
against `.bss'
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Add._omp_fn.1':
stream.c:(.text+0x1f4): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x1fd): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x23b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x242): additional relocation overflows omitted from the
output
collect2: error: ld returned 1 exit status
=== Trying large memory model ===
(this can take a while to compile)

=== Testing up to 40 cores ===

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 481987490, Offset = 0
Total memory required = 11031.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 520522 microseconds.
    (= 520522 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        9552.4291       0.8205       0.8073       0.8319
Scale:       9818.6724       0.8128       0.7854       0.8333
Add:        10600.6443       1.1113       1.0912       1.1557
Triad:      10598.8774       1.1073       1.0914       1.1301
-------------------------------------------------------------
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000
-------------------------------------------------------------

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      26541.4003       0.4667       0.4358       0.5130
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      30890.3475       0.3798       0.3745       0.3842
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000

...

Number of Threads requested = 40
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      53770.7791       0.2186       0.2151       0.2258
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000



For hardware specs, both servers have same CPU & RAM ; this is 40 *
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz ; and 256 GB RAM.

I'll do tomorrow morning switch between master & slave ; so I probably
will have more information at this moment on SSD server. (sar, dstat
reports, ..)
>
> For 2) I think we're still missing a lot of information about what's happening in your database to be able to say
much,i.e. what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate?  Perhaps
youcould give a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the
pg_stat_database,pg_stat_user_tables and pg_staio_user_tables views over a specific period of time. 
>
> You really need to dig into each of these yourself to try and see where your new system might be lacking.
Quick answer on principal values on db
size ~ 150 GB.
Standard comportement is 55% update, 40% select & 10 % insert ; around
170 TPS.

I've set up recording of pg_stat_database for getting stats and will try
to analyze other tables.






pgsql-admin by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: PostgreSQL 9.1 "database system was interrupted; last known up at"
Next
From: Glyn Astill
Date:
Subject: Re: Performances issues with SSD volume ?