2011/11/6 Tomas Vondra <tv@fuzzy.cz>:
> Hi everyone,
> The patch implements a simple "cleaning" that replaces the parameter
> values with generic strings - e.g. numbers are turned to ":n", so the
> queries mentioned above are turned to
>
> SELECT abalance FROM pgbench_accounts WHERE aid = :n
>
> and thus tracked as a single query in pg_stat_statements.
I'm a couple of days away from posting a much better principled
implementation of pg_stat_statements normalisation. To normalise, we
perform a selective serialisation of the query tree, which is hashed.
This has the additional benefit of considering queries equivalent even
when, for example, there is a different amount of whitespace, or if
queries differ only in their use of aliases, or if queries differ only
in that one uses a noise word the other doesn't. It also does things
like intelligently distinguishing between queries with different
limit/offset constant values, as these constants are deemed to be
differentiators of queries for our purposes. A guiding principal that
I've followed is that anything that could result in a different plan
is a differentiator of queries.
I intend to maintain a backwards compatible version, as this can be
expected to work with earlier versions of Postgres.
There will be additional infrastructure added to the parser to support
normalisation of query strings for the patch I'll be submitting (that
obviously won't be supported in the version that builds against
existing Postgres versions that I'll make available). Essentially,
I'll be adding a length field to certain nodes, to go alongside the
existing location field (currently just used to highlight an offending
token in an error message outside the parser). pg_stat_statements will
then use the location and length field of const nodes to swap out
constants in the query string.
It's unfortunate that there was a duplicate effort here. With respect,
the approach that you've taken probably would have turned out to have
been a bit of a tar pit - I'm reasonably sure that had you pursued it,
you'd have found yourself duplicating quite a bit of the parser as new
problems came to light.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services