Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Date
Msg-id 10079.1332436744@sss.pgh.pa.us
Whole thread Raw
In response to Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
List pgsql-hackers
Peter Geoghegan <peter@2ndquadrant.com> writes:
> [ pg_stat_statements_norm_2012_02_29.patch ]

I started to look at this patch (just the core-code modifications so
far).  There are some things that seem not terribly well thought out:

* It doesn't look to me like it will behave very sanely with rules.
The patch doesn't store queryId in a stored rule tree, so a Query
retrieved from a stored rule will have a zero queryId, and that's
what will get pushed through to the resulting plan tree as well.
So basically all DO ALSO or DO INSTEAD operations are going to get
lumped together by pg_stat_statements, and separated from the queries
that triggered them, which seems pretty darn unhelpful.

I don't know that storing queryId would be better, since after a restart
that'd mean there are query IDs running around in the system that the
current instance of pg_stat_statements has never heard of.  Permanently
stored query IDs would also be a headache if you needed to change the
fingerprint algorithm, or if there were more than one add-on trying to
use the query ID support.

I'm inclined to think that the most useful behavior is to teach the
rewriter to copy queryId from the original query into all the Queries
generated by rewrite.  Then, all rules fired by a source query would
be lumped into that query for tracking purposes.  This might not be
the ideal behavior either, but I don't see a better solution.

* The patch injects the query ID calculation code by redefining
parse_analyze and parse_analyze_varparams as hookable functions and
then getting into those hooks.  I don't find this terribly sane either.
pg_stat_statements has no interest in the distinction between those two
methods of getting into parse analysis.  Perhaps more to the point,
those are not the only two ways of getting into parse analysis: some
places call transformTopLevelStmt directly, for instance
pg_analyze_and_rewrite_params.  While it might be that the code paths
that do that are not of interest for fingerprinting queries, it's far
from obvious that these two are the correct and only places to do such
fingerprinting.

I think that if we are going to take the attitude that we only care
about fingerprinting queries that come in from the client, then we
ought to call the fingerprinting code in the client-message-processing
routines in postgres.c.  But in that case we need to be a little clearer
about what we are doing with unfingerprinted queries.  Alternatively,
we might take the position that we want to fingerprint every Query
struct, but in that case the existing hooks are clearly insufficient.
This seems to boil down to what you want to have happen with queries
created/executed inside functions, which is something I don't recall
being discussed.

Either way, I think we'd be a lot better advised to define a single
hook "post_parse_analysis_hook" and make the core code responsible
for calling it at the appropriate places, rather than supposing that
the contrib module knows exactly which core functions ought to be
the places to do it.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Gsoc2012 Idea --- Social Network database schema
Next
From: Dimitri Fontaine
Date:
Subject: Re: Finer Extension dependencies