Re: Improvement of pg_stat_statement usage about buffer hit ratio - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Improvement of pg_stat_statement usage about buffer hit ratio |
Date | |
Msg-id | CAM3SWZQsZ7hH6H38+SEDr207EJ1ua04F93V4bd7Hn6khJpfLsw@mail.gmail.com Whole thread Raw |
In response to | Re: Improvement of pg_stat_statement usage about buffer hit ratio (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>) |
List | pgsql-hackers |
On Mon, Nov 18, 2013 at 10:08 PM, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote: > I regret past decision of Itagaki-san's patch, and improvement might not be > possible. However, we can change it, if we get have logical reason to change > it. I suppose so. >> I don't think that the tool needs to be expensive. If selecting from >> the pg_stat_statements view every 1-3 seconds is too expensive for >> such a tool, we can have a discussion about being smarter, because >> there certainly are ways to optimize it. > > I can understand why you say my patch is heavy now! Your monitoring methods > are redically heavy. In general, we get pg_stat_statements view every 1 min > - 5min. It is because monitoring SQLs must not heavier than common main > SQLs. Certainly, not everyone would want to take snapshots so frequently. Most users would not. However, if you're worried about the overhead of parsing the pg_stat_statements call, and then executing pg_stat_statements, you're worried about the wrong thing. What about the overhead of fingerprinting every single statement executed against the server? Now, as I've already acknowledged I might actually have regressed things when pg_stat_statements is selected from very frequently. That increases the need for the tool to support external aggregation tools well. > If we measure the real performance, we don't measure with monitoring > SQL's cost. And, I cannot still understand you'd like to collect drastic > detail performance of statements. I'd like to only know max, avg and stddev > in each statement. They are enough, because we can improve them by using > these information. It's not just the performance cost of this one addition. It's also a question of "where do we draw the line?", particularly given the current lack of atomic portable addition. Most users of pg_stat_statements don't care about the min/max. Furthermore, I'm not convinced that you've adequately addressed the question of local minima and maxima, even in your most recent revision. > In summary of your comment, your patch is lower cost than I proposed patch. > Because my patch has long lock problem, on the other hands your patch > doesn't these problem. Is it right? If it true, I can understand your > theoretically, but I'm not sure without real practice or benchmark that it > is really or not. So we will need benchmark test in my patch and yours. I > try it. I don't know why you decided to compare our two patches at all - what's the point? How are you going to benchmark my patch? If the answer is pgbench, let me save you some time: there is no added overhead there, because any additional cost that may exist is paid only once per entry (at a time when an exclusive lock already needs to be acquired on the entire table to create a new entry anyway). Furthermore, my patch might actually be cost negative (i.e. faster), because we can create a query text in the external file with only a shared lock held (just using a spinlock to read-and-advance the offset into the external file), rather than, with an *exclusive* lock held, performing memcmp() to the 1KiB text shared memory buffer. > I joined the db tech show case 2013 which is held in japan last week. Oracle > speaker intoroduced performance schema and like these in MySQL 5.6. This is > the slide of his. It's almost in japanese, but please see it since 31page. > It is wirtten in SQL. > http://www.slideshare.net/yoyamasaki/20131110-tuning-onmysql56 > > In MySQL 5.6, it has information which are sum_time, min_time, avg_time, > max_time and sum_lock_time. I think it is useful for understanding our > executing statements. Based on a reading of https://dev.mysql.com/doc/refman/5.6/en/events-statements-current-table.html, I don't think that is comparable to pg_stat_statements. It looks like this doesn't do any normalization - if it did, why is there "one row per thread"? >> This is incomprehensible. As with the cumulative aggregate costs, how >> is a consumer of pg_stat_statements possibly going to get the full >> query text from anywhere else? It *has* to come from >> pg_stat_statements directly. > > I think that it is necessary to make it clear with real test. I'd have started with a benchmark of your changes to pg_stat_statements for an unsympathetic case. Even though there is surely some demand for what you're asking for, you haven't adequately demonstrated or even argued that what you've proposed is the right way to do it, nor have you even tried to quantify the costs, or given any indication that you've considered them. -- Peter Geoghegan
pgsql-hackers by date: