Re: [PATCH] optional cleaning queries stored in pg_stat_statements - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Date
Msg-id 4EB6021C.2080700@fuzzy.cz
Whole thread Raw
In response to Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Re: [PATCH] optional cleaning queries stored in pg_stat_statements
List pgsql-hackers
Dne 6.11.2011 03:16, Peter Geoghegan napsal(a):
> 2011/11/6 Tomas Vondra <tv@fuzzy.cz>:
>> Hi everyone,
> 
>> The patch implements a simple "cleaning" that replaces the parameter
>> values with generic strings - e.g. numbers are turned to ":n", so the
>> queries mentioned above are turned to
>>
>>   SELECT abalance FROM pgbench_accounts WHERE aid = :n
>>
>> and thus tracked as a single query in pg_stat_statements.
> 
> I'm a couple of days away from posting a much better principled
> implementation of pg_stat_statements normalisation. To normalise, we
> perform a selective serialisation of the query tree, which is hashed.

OK, my patch definitely is not the only possible and if there's a way to
get more info from the planner, the results surely might be better. My
goal was to provide a simple patch that solves the problem better then
I'll be more than happy to remove mine.

> This has the additional benefit of considering queries equivalent even
> when, for example, there is a different amount of whitespace, or if
> queries differ only in their use of aliases, or if queries differ only
> in that one uses a noise word the other doesn't. It also does things

Yup, that's nice. And achieving the same behavior (aliases, noise) would
require a much more complex parser than the one I wrote.

> like intelligently distinguishing between queries with different
> limit/offset constant values, as these constants are deemed to be
> differentiators of queries for our purposes. A guiding principal that
> I've followed is that anything that could result in a different plan
> is a differentiator of queries.

Not sure if I understand it correctly - do you want to keep multiple
records for a query, for each differentiator (different limit/offset
values, etc.)? That does not make much sense to me - even a different
parameter value may cause change of the plan, so I don't see much
difference between a query parameter, limit/offset constant values etc.

If it was possible to compare the actual plan (or at least a hash of the
plan), and keeping one record for each plan, that'd be extremely nice.
You'd see that the query was executed with 3 different plans, number of
calls, average duration etc.

> I intend to maintain a backwards compatible version, as this can be
> expected to work with earlier versions of Postgres.
>
> There will be additional infrastructure added to the parser to support
> normalisation of query strings for the patch I'll be submitting (that
> obviously won't be supported in the version that builds against
> existing Postgres versions that I'll make available). Essentially,
> I'll be adding a length field to certain nodes, to go alongside the
> existing location field (currently just used to highlight an offending
> token in an error message outside the parser). pg_stat_statements will
> then use the location and length field of const nodes to swap out
> constants in the query string.
> 
> It's unfortunate that there was a duplicate effort here. With respect,
> the approach that you've taken probably would have turned out to have
> been a bit of a tar pit - I'm reasonably sure that had you pursued it,
> you'd have found yourself duplicating quite a bit of the parser as new
> problems came to light.

The duplicate effort is not a problem. I needed to learn something more
about the internals and how to use the executor hooks, and the
pg_stat_statements is a neat place to learn this. The patch is rather a
side effect of this effort, so no waste on my side.

Anyway you're right that the approach I've taken is not much extensible
without building a parser parallel to the actual one. It was meant as a
simple solution not solving the problem perfectly, just sufficiently for
what I need.

Tomas


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Next
From: Greg Smith
Date:
Subject: Measuring relation free space