Re: [PATCH] optional cleaning queries stored in pg_stat_statements - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] optional cleaning queries stored in pg_stat_statements |
Date | |
Msg-id | 22410.1320620437@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] optional cleaning queries stored in pg_stat_statements (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: [PATCH] optional cleaning queries stored in pg_stat_statements
|
List | pgsql-hackers |
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 6 November 2011 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Are you intending to hash the raw >> grammar output tree, the parse analysis result, the rewriter result, >> or the actual plan tree? > I'm hashing the Query tree from a planner plugin (function assigned to > planner_hook), immediately prior to it being passed to > standard_planner. IOW, the rewriter result. Why that choice? That seems like about the least useful of the four possibilities, since it provides no certainty about the plan while also being as far from the original text as you can get (thus making the problem of what to display pretty hard). > A major consideration was backwards compatibility; This is not a consideration that the community is likely to weigh heavily, or indeed at all. We aren't going to back-port this feature into prior release branches, and we aren't going to want to contort its definition to make that easier. If 2ndquadrant wants to ship a back-branch version with the feature, you can certainly also back-port a patch that adds a hook where you need it, if you need a new hook. But frankly I'm a bit surprised that you're not hashing the query plan, since you could get that in the ExecutorStart hook that pg_stat_statements already uses. With a hook placed somewhere else, you add additional implementation problems of matching up the calls to that hook with later calls to the executor hook. >> I'm not real sure whether it's better to classify on the basis of >> similar plans or similar original queries, anyway. This seems like >> something that could be open for debate about use-cases. > Indeed - it's generally difficult to reason about what behaviour is > optimal when attempting to anticipate the needs of every Postgres DBA. > In this case, I myself lean strongly towards similar original queries, > because classifying on the basis of similar plans isn't likely to be > nearly as actionable, and there are really no obvious precedents to > draw on - how can it be turned into a useful tool? Hm, well, if that's what you think, why so late in the sequence? Hashing the raw grammar output would be the best way to identify similar original queries, ISTM. You'd also have a better chance of getting people to hold still for the extra overhead fields, if they didn't need to propagate further than that representation. Basically, I think there are fairly clear arguments for using a hash of the raw grammar output, if you lean towards the "classify by original query" viewpoint; or a hash of the plan tree, if you lean towards the "classify by plan" viewpoint. I don't see what use-cases would make it optimal to hash one of the intermediate stages. I was hoping you had an argument stronger than "it'll be easy to back-port" for that. regards, tom lane
pgsql-hackers by date: