Re: Performances issues with SSD volume ? - Mailing list pgsql-admin

From Glyn Astill
Subject Re: Performances issues with SSD volume ?
Date
Msg-id 1693885845.798995.1432311684859.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Re: Performances issues with SSD volume ?  (Thomas SIMON <tsimon@neteven.com>)
Responses Re: Performances issues with SSD volume ?
List pgsql-admin
----- Original Message -----

> From: Thomas SIMON <tsimon@neteven.com>
> To: Glyn Astill <glynastill@yahoo.co.uk>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Friday, 22 May 2015, 16:29
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
> Thomas
>
> Le 22/05/2015 11:37, Glyn Astill a écrit :
>>>  From: Thomas SIMON <tsimon@neteven.com>
>>>  To: Glyn Astill <glynastill@yahoo.co.uk>
>>>  Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>>>  Sent: Thursday, 21 May 2015, 17:56
>>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>  Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>>      I think at this point you could do with going back and
> trying to
>>>  reproduce
>>>>>>    the issue, then trace back up to pg_stat_activity to see
> what
>>>  activity could be
>>>>>>    causing the disk i/o.  I assume you've tried to
> reproduce the
>>>  disk issues
>>>>>>    with a simple disk benchmark like bonnie++?
>>>>>    Yes, I think the same thing. Probably I will doing this
> tomorrow early
>>>>>    in the morning.
>>>>>    I tried to reproduce disk issues with different stress tests
> like
>>>>>    bonnie, fio, tsung, and I use a more realistic scenario with
> pgreplay
>>>  to
>>>>>    reproduce my production trafic from postgresql logfile.
>>>>>    However, I'm note sure how to diagnostic performance
> issues.
>>>>>    I mean, if I see ssd are 100% full, how can I figure out why
> their
>>>>>    behavior changes ?
>>>>>
>>>>    Well the disk benchmarks are purely to see what your disks are
> capable of,
>>>  and help with your initial tuning.
>>>>
>>>>    You need to trace back which processes are causing most of the IO
>>>  you're seeing, as well as the postgresql logs something like iotop,
> or dstat
>>>  with the --top-bio option might help you there.
>>>>
>>>>    You could also look at the pg_statio_user_tables view to narrow
> down which
>>>  tables are being hit the hardest, which might give you some clues.
>>>  Is there something to activate for seeing something in this table ?
>>>  Because its empty on my production server
>>>
>>>  postgres=# select * from pg_statio_user_tables;
>>>     relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>>>  idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>>>  tidx_blks_read | tidx_blks_hit
>>>
>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>>>  (0 rows)
>>>
>>
>>  Looks like you need to set track_counts=on then.  Infact if you've got
> track_counts off then you're also not running autovacuum, that's a
> warning flag unless it's intentional.
> Arf, no it's just me who does dummy things ...
> I was in postgres database, not in my app...
>
>
> Here is result of my top tables.
>
> relid   | schemaname | relname                        | heap_blks_read |
> heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read |
> toast_blks_hit | tidx_blks_read | tidx_blks_hit
>
----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
>    2056411 | public     | table 1
> |   326588572770 | 432317896616 |   12839634494 |   33230958888 |
> 52298815 |       20020628 |        9406665 |     153557966
>    2059324 | public     | table 2                               |
> 223323685944 |   72194867978 |   10319078550 |  103672304590 |
> 189135 |         262100 |          27815 |        207212
>    2056368 | public     | table 3
>                                                 |   198706578066 |
> 1259514828344 |   21083719219 | 1456776226532 |           90446
> |         331680 |          30850 |        367441
>    2056347 | public     | table 4 |   124086575669 |  554775474730 |
> 5236764894 |  614913194098 |        14978587 |       67085822 |
> 751951 |     134187396
>    2057296 | public     | table 5
> |    68120702371 | 18134568265 |    1243107028 |   10537765277 |
> 350646003 |       87888306 |      119113944 |     629790819
>    2058460 | public     | table 6 |    35119995405 |  158177258403 |
> 1687014495 | 1395923096049 |          738542 |         214574 |
> 325765 |       1318187
>    2058900 | public     | table 7
> |    34205345184 | 271135546976 |   21050227933 |  252099340046
> |            3924 |           6567 |            422 |         16333
>    2059195 | public     | table 8
>                                               |    29169256266 |
> 152937546457 |    2263998316 |  127782237565 |               0
> |              0 |              0 |             0
>
> how you interpret these values?
> You check ratio between  heap_read & heap_hit ?
> i've made ratio, it's about 89%

>


It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared buffers
as"hit".   


See here: http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW


Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o
performance(or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are
thereimprovements to be had with your database and application setup. 

For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else
isgoing off on them.  If you can't do that on your production server perhaps just the output from sar -d during a busy
periodmight help.  Hopefully that would help to clarify the specific disk performance differences for us.  Other
interestingtests would be the output of Greg Smiths stream scaling script
(https://github.com/gregs1104/stream-scaling). We're still missing the exact specs of your old system and the CPUs in
thenew one. 


For 2) I think we're still missing a lot of information about what's happening in your database to be able to say much,
i.e.what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate?  Perhaps you
couldgive a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the
pg_stat_database,pg_stat_user_tables and pg_staio_user_tables views over a specific period of time. 

You really need to dig into each of these yourself to try and see where your new system might be lacking.


pgsql-admin by date:

Previous
From: danny
Date:
Subject: PSQL
Next
From: Jan Lentfer
Date:
Subject: Re: PSQL