Re: [PATCH] optional cleaning queries stored in pg_stat_statements - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] optional cleaning queries stored in pg_stat_statements |
Date | |
Msg-id | 4EB6021C.2080700@fuzzy.cz Whole thread Raw |
In response to | Re: [PATCH] optional cleaning queries stored in pg_stat_statements (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Re: [PATCH] optional cleaning queries stored in pg_stat_statements |
List | pgsql-hackers |
Dne 6.11.2011 03:16, Peter Geoghegan napsal(a): > 2011/11/6 Tomas Vondra <tv@fuzzy.cz>: >> Hi everyone, > >> The patch implements a simple "cleaning" that replaces the parameter >> values with generic strings - e.g. numbers are turned to ":n", so the >> queries mentioned above are turned to >> >> SELECT abalance FROM pgbench_accounts WHERE aid = :n >> >> and thus tracked as a single query in pg_stat_statements. > > I'm a couple of days away from posting a much better principled > implementation of pg_stat_statements normalisation. To normalise, we > perform a selective serialisation of the query tree, which is hashed. OK, my patch definitely is not the only possible and if there's a way to get more info from the planner, the results surely might be better. My goal was to provide a simple patch that solves the problem better then I'll be more than happy to remove mine. > This has the additional benefit of considering queries equivalent even > when, for example, there is a different amount of whitespace, or if > queries differ only in their use of aliases, or if queries differ only > in that one uses a noise word the other doesn't. It also does things Yup, that's nice. And achieving the same behavior (aliases, noise) would require a much more complex parser than the one I wrote. > like intelligently distinguishing between queries with different > limit/offset constant values, as these constants are deemed to be > differentiators of queries for our purposes. A guiding principal that > I've followed is that anything that could result in a different plan > is a differentiator of queries. Not sure if I understand it correctly - do you want to keep multiple records for a query, for each differentiator (different limit/offset values, etc.)? That does not make much sense to me - even a different parameter value may cause change of the plan, so I don't see much difference between a query parameter, limit/offset constant values etc. If it was possible to compare the actual plan (or at least a hash of the plan), and keeping one record for each plan, that'd be extremely nice. You'd see that the query was executed with 3 different plans, number of calls, average duration etc. > I intend to maintain a backwards compatible version, as this can be > expected to work with earlier versions of Postgres. > > There will be additional infrastructure added to the parser to support > normalisation of query strings for the patch I'll be submitting (that > obviously won't be supported in the version that builds against > existing Postgres versions that I'll make available). Essentially, > I'll be adding a length field to certain nodes, to go alongside the > existing location field (currently just used to highlight an offending > token in an error message outside the parser). pg_stat_statements will > then use the location and length field of const nodes to swap out > constants in the query string. > > It's unfortunate that there was a duplicate effort here. With respect, > the approach that you've taken probably would have turned out to have > been a bit of a tar pit - I'm reasonably sure that had you pursued it, > you'd have found yourself duplicating quite a bit of the parser as new > problems came to light. The duplicate effort is not a problem. I needed to learn something more about the internals and how to use the executor hooks, and the pg_stat_statements is a neat place to learn this. The patch is rather a side effect of this effort, so no waste on my side. Anyway you're right that the approach I've taken is not much extensible without building a parser parallel to the actual one. It was meant as a simple solution not solving the problem perfectly, just sufficiently for what I need. Tomas
pgsql-hackers by date: