Re: [PATCH] optional cleaning queries stored in pg_stat_statements - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Date
Msg-id 22410.1320620437@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] optional cleaning queries stored in pg_stat_statements  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: [PATCH] optional cleaning queries stored in pg_stat_statements
List pgsql-hackers
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 6 November 2011 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Are you intending to hash the raw
>> grammar output tree, the parse analysis result, the rewriter result,
>> or the actual plan tree?

> I'm hashing the Query tree from a planner plugin (function assigned to
> planner_hook), immediately prior to it being passed to
> standard_planner.

IOW, the rewriter result.  Why that choice?  That seems like about the
least useful of the four possibilities, since it provides no certainty
about the plan while also being as far from the original text as you
can get (thus making the problem of what to display pretty hard).

> A major consideration was backwards compatibility;

This is not a consideration that the community is likely to weigh
heavily, or indeed at all.  We aren't going to back-port this feature
into prior release branches, and we aren't going to want to contort its
definition to make that easier.  If 2ndquadrant wants to ship a
back-branch version with the feature, you can certainly also back-port
a patch that adds a hook where you need it, if you need a new hook.

But frankly I'm a bit surprised that you're not hashing the query plan,
since you could get that in the ExecutorStart hook that
pg_stat_statements already uses.  With a hook placed somewhere else, you
add additional implementation problems of matching up the calls to that
hook with later calls to the executor hook.

>> I'm not real sure whether it's better to classify on the basis of
>> similar plans or similar original queries, anyway.  This seems like
>> something that could be open for debate about use-cases.

> Indeed - it's generally difficult to reason about what behaviour is
> optimal when attempting to anticipate the needs of every Postgres DBA.
> In this case, I myself lean strongly towards similar original queries,
> because classifying on the basis of similar plans isn't likely to be
> nearly as actionable, and there are really no obvious precedents to
> draw on - how can it be turned into a useful tool?

Hm, well, if that's what you think, why so late in the sequence?
Hashing the raw grammar output would be the best way to identify similar
original queries, ISTM.  You'd also have a better chance of getting
people to hold still for the extra overhead fields, if they didn't
need to propagate further than that representation.

Basically, I think there are fairly clear arguments for using a hash of
the raw grammar output, if you lean towards the "classify by original
query" viewpoint; or a hash of the plan tree, if you lean towards the
"classify by plan" viewpoint.  I don't see what use-cases would make it
optimal to hash one of the intermediate stages.  I was hoping you had an
argument stronger than "it'll be easy to back-port" for that.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Next
From: J Smith
Date:
Subject: Re: unaccent extension missing some accents