Re: Patch: add timing of buffer I/O requests - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: Patch: add timing of buffer I/O requests |
Date | |
Msg-id | 4F44FF1C.1030409@2ndQuadrant.com Whole thread Raw |
In response to | Patch: add timing of buffer I/O requests (Ants Aasma <ants.aasma@eesti.ee>) |
Responses |
Re: Patch: add timing of buffer I/O requests
|
List | pgsql-hackers |
Attached are updated versions of this feature without the pg_test_timing tool part, since I broke that out into another discussion thread. I've split the part that updates pg_stat_statistics out from the main feature too, separate patch attached to here (but I'm not reviewing that yet). Lots of bitrot since this was submitted, and yes I noticed that I've almost recreated earlier versions of this patch--by splitting off the parts that were developed later. Earlier discussion of this got side tracked on a few things, partly my fault. It's worth taking a look at what this provides before judging it too much. It can demo well. The stated purpose is helping figure out what relations are gobbling up the most access time, presumably to optimize them and/or the storage they are on. "What do I put onto SSD" is surely a popular request nowadays. To check suitability for that, I decided to run the standard pgbench test and see what it would show lots of time being consumed by. Any answer other than "pgbench_accounts and to a lesser extent its index" is a failing grade. I started with a clean database and OS cache so I'd get real read timings: $ psql -d pgbench -x -c "select relname,heap_blks_read,heap_blks_hit,heap_blks_time, idx_blks_read ,idx_blks_hit,idx_blks_time from pg_statio_user_tables where idx_blks_read > 0 order by heap_blks_time desc" relname | pgbench_accounts heap_blks_read | 7879 heap_blks_hit | 43837 heap_blks_time | 151770 idx_blks_read | 7503 idx_blks_hit | 60484 idx_blks_time | 70968 relname | pgbench_tellers heap_blks_read | 19 heap_blks_hit | 15856 heap_blks_time | 105 idx_blks_read | 11 idx_blks_hit | 15745 idx_blks_time | 62 relname | pgbench_branches heap_blks_read | 11 heap_blks_hit | 32333 heap_blks_time | 77 idx_blks_read | 2 idx_blks_hit | 0 idx_blks_time | 9 Now, the first critical question to ask is "what additional information is this providing above the existing counters?" After all, it's possible to tell pgbench_accounts is the hotspot just from comparing heap_blks_read, right? To really be useful, this would need to make it obvious that reads from pgbench_accounts are slower than the other two, because it's bigger and requires more seeking around to populate. That should show up if we compute time per read numbers: $ psql -d pgbench -x -c "select relname, 1.0 * heap_blks_time / heap_blks_read as time_per_read, 1.0 * idx_blks_time / idx_blks_read as time_per_idx_read from pg_statio_user_tables where idx_blks_read > 0 order by heap_blks_time desc" relname | pgbench_accounts time_per_read | 19.2625967762406397 time_per_idx_read | 9.4586165533786485 relname | pgbench_tellers time_per_read | 5.5263157894736842 time_per_idx_read | 5.6363636363636364 relname | pgbench_branches time_per_read | 7.0000000000000000 time_per_idx_read | 4.5000000000000000 This run looks useful at providing the data wished for--that read times are slower per capita from the accounts table. The first time I tried this I got a bizarre high number for pgbench_branches.heap_blks_time ; I'm not sure how reliable this is yet. One problem that might be easy to fix is that the write timing info doesn't show in any of these system views, only in EXPLAIN and statement level ones. I still think a full wait timing interface is the right long-term direction here. It's hard to reject this idea when it seems to be working right now though, while more comprehensive wait storage is still at least a release off. Opinions welcome, I'm still juggling this around now that I have it working again. Some implementation notes. This currently fails regression test create_function_3, haven't looked into why yet. I've confirmed that on a system where timing is cheap, this is too. On something touching real data, not just a synthetic thing moving memory around, Aants couldn't measure it on a server similar to mine; I can't either. Yes, this is going to gobble up more room for statistics. The track_iotiming GUC seems to work as expected. Off by default, can turn it on in a session and see that session's work get timed, and it toggles on a config reload. Everything needed to only turn it on selectively; main penalty you pay all the time is the stats bloat. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Attachment
pgsql-hackers by date: