Thread: [PATCH] optional cleaning queries stored in pg_stat_statements
Hi everyone, I propose a patch that would allow optional "cleaning" of queries tracked in pg_stat_statements, compressing the result and making it more readable. The default behavior is that when the same query is run with different parameter values, it's actually stored as two separate queries (the string do differ). A small example - when you run "pgbench -S" you'll get queries like this SELECT abalance FROM pgbench_accounts WHERE aid = 12433 SELECT abalance FROM pgbench_accounts WHERE aid = 2322 SELECT abalance FROM pgbench_accounts WHERE aid = 52492 and so on, and each one is listed separately in the pg_stat_statements. This often pollutes the pg_stat_statements. The patch implements a simple "cleaning" that replaces the parameter values with generic strings - e.g. numbers are turned to ":n", so the queries mentioned above are turned to SELECT abalance FROM pgbench_accounts WHERE aid = :n and thus tracked as a single query in pg_stat_statements. The patch provides an enum GUC (pg_stat_statements.clean) with three options - none, basic and aggressive. The default option is "none", the "basic" performs the basic value replacement (described above) and "aggressive" performs some additional cleaning - for example replaces multiple spaces with a single one etc. The parsing is intentionally very simple and cleans the query in a single pass. Currently handles three literal types: a) string (basic, C-style escaped, Unicode-escaped, $-espaced) b) numeric (although 1.925e-3 is replaced by :n-:n) c) boolean (true/false) There is probably room for improvement (e.g. handling UESCAPE). Tomas
Attachment
2011/11/6 Tomas Vondra <tv@fuzzy.cz>: > Hi everyone, > The patch implements a simple "cleaning" that replaces the parameter > values with generic strings - e.g. numbers are turned to ":n", so the > queries mentioned above are turned to > > SELECT abalance FROM pgbench_accounts WHERE aid = :n > > and thus tracked as a single query in pg_stat_statements. I'm a couple of days away from posting a much better principled implementation of pg_stat_statements normalisation. To normalise, we perform a selective serialisation of the query tree, which is hashed. This has the additional benefit of considering queries equivalent even when, for example, there is a different amount of whitespace, or if queries differ only in their use of aliases, or if queries differ only in that one uses a noise word the other doesn't. It also does things like intelligently distinguishing between queries with different limit/offset constant values, as these constants are deemed to be differentiators of queries for our purposes. A guiding principal that I've followed is that anything that could result in a different plan is a differentiator of queries. I intend to maintain a backwards compatible version, as this can be expected to work with earlier versions of Postgres. There will be additional infrastructure added to the parser to support normalisation of query strings for the patch I'll be submitting (that obviously won't be supported in the version that builds against existing Postgres versions that I'll make available). Essentially, I'll be adding a length field to certain nodes, to go alongside the existing location field (currently just used to highlight an offending token in an error message outside the parser). pg_stat_statements will then use the location and length field of const nodes to swap out constants in the query string. It's unfortunate that there was a duplicate effort here. With respect, the approach that you've taken probably would have turned out to have been a bit of a tar pit - I'm reasonably sure that had you pursued it, you'd have found yourself duplicating quite a bit of the parser as new problems came to light. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Dne 6.11.2011 03:16, Peter Geoghegan napsal(a): > 2011/11/6 Tomas Vondra <tv@fuzzy.cz>: >> Hi everyone, > >> The patch implements a simple "cleaning" that replaces the parameter >> values with generic strings - e.g. numbers are turned to ":n", so the >> queries mentioned above are turned to >> >> SELECT abalance FROM pgbench_accounts WHERE aid = :n >> >> and thus tracked as a single query in pg_stat_statements. > > I'm a couple of days away from posting a much better principled > implementation of pg_stat_statements normalisation. To normalise, we > perform a selective serialisation of the query tree, which is hashed. OK, my patch definitely is not the only possible and if there's a way to get more info from the planner, the results surely might be better. My goal was to provide a simple patch that solves the problem better then I'll be more than happy to remove mine. > This has the additional benefit of considering queries equivalent even > when, for example, there is a different amount of whitespace, or if > queries differ only in their use of aliases, or if queries differ only > in that one uses a noise word the other doesn't. It also does things Yup, that's nice. And achieving the same behavior (aliases, noise) would require a much more complex parser than the one I wrote. > like intelligently distinguishing between queries with different > limit/offset constant values, as these constants are deemed to be > differentiators of queries for our purposes. A guiding principal that > I've followed is that anything that could result in a different plan > is a differentiator of queries. Not sure if I understand it correctly - do you want to keep multiple records for a query, for each differentiator (different limit/offset values, etc.)? That does not make much sense to me - even a different parameter value may cause change of the plan, so I don't see much difference between a query parameter, limit/offset constant values etc. If it was possible to compare the actual plan (or at least a hash of the plan), and keeping one record for each plan, that'd be extremely nice. You'd see that the query was executed with 3 different plans, number of calls, average duration etc. > I intend to maintain a backwards compatible version, as this can be > expected to work with earlier versions of Postgres. > > There will be additional infrastructure added to the parser to support > normalisation of query strings for the patch I'll be submitting (that > obviously won't be supported in the version that builds against > existing Postgres versions that I'll make available). Essentially, > I'll be adding a length field to certain nodes, to go alongside the > existing location field (currently just used to highlight an offending > token in an error message outside the parser). pg_stat_statements will > then use the location and length field of const nodes to swap out > constants in the query string. > > It's unfortunate that there was a duplicate effort here. With respect, > the approach that you've taken probably would have turned out to have > been a bit of a tar pit - I'm reasonably sure that had you pursued it, > you'd have found yourself duplicating quite a bit of the parser as new > problems came to light. The duplicate effort is not a problem. I needed to learn something more about the internals and how to use the executor hooks, and the pg_stat_statements is a neat place to learn this. The patch is rather a side effect of this effort, so no waste on my side. Anyway you're right that the approach I've taken is not much extensible without building a parser parallel to the actual one. It was meant as a simple solution not solving the problem perfectly, just sufficiently for what I need. Tomas
Peter Geoghegan <peter@2ndquadrant.com> writes: > I'm a couple of days away from posting a much better principled > implementation of pg_stat_statements normalisation. To normalise, we > perform a selective serialisation of the query tree, which is hashed. That seems like an interesting approach, and definitely a lot less of a kluge than what Tomas suggested. Are you intending to hash the raw grammar output tree, the parse analysis result, the rewriter result, or the actual plan tree? I don't necessarily have a preconceived notion about which is best (I can think of pluses and minuses for each), but we should hear your explanation of which one you chose and what your reasoning was. > ... It also does things > like intelligently distinguishing between queries with different > limit/offset constant values, as these constants are deemed to be > differentiators of queries for our purposes. A guiding principal that > I've followed is that anything that could result in a different plan > is a differentiator of queries. This claim seems like bunk, unless you're hashing the plan tree, in which case it's tautological. As an example, switching from "where x < 1" to "where x < 2" could make the planner change plans, if there's a sufficiently large difference in the selectivity of the two cases (or even if there's a very small difference, but we're right at the tipping point where the estimated costs are equal). There's no way to know this in advance unless you care to duplicate all the planner cost estimation logic. And I don't think you actually want to classify "where x < 1" and "where x < 2" differently just because they *might* give different plans in corner cases. I'm not real sure whether it's better to classify on the basis of similar plans or similar original queries, anyway. This seems like something that could be open for debate about use-cases. > There will be additional infrastructure added to the parser to support > normalisation of query strings for the patch I'll be submitting (that > obviously won't be supported in the version that builds against > existing Postgres versions that I'll make available). Essentially, > I'll be adding a length field to certain nodes, This seems like a good way to get your patch rejected: adding overhead to the core system for the benefit of a feature that not everyone cares about is problematic. Why do you need it anyway? Surely it's possible to determine the length of a literal token after the fact. More generally, if you're hashing anything later than the raw grammar tree, I think that generating a suitable representation of the queries represented by a single hash entry is going to be problematic anyway. There could be significant differences --- much more than just the values of constants --- between query strings that end up being semantically the same. Or for that matter we could have identical query strings that wind up being considered different because of the action of search_path or other context. It might be that the path of least resistance is to document that we select one of the actual query strings "at random" to represent all the queries that went into the same hash entry, and not even bother with trying to strip out constants. The effort required to do that seems well out of proportion to the reward, if we can't do a perfect job of representing the common aspects of the queries. regards, tom lane
Tomas Vondra <tv@fuzzy.cz> writes: > If it was possible to compare the actual plan (or at least a hash of the > plan), and keeping one record for each plan, that'd be extremely nice. > You'd see that the query was executed with 3 different plans, number of > calls, average duration etc. I like that idea. How does that integrates to current efforts? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 6 Listopad 2011, 16:08, Tom Lane wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> I'm a couple of days away from posting a much better principled >> implementation of pg_stat_statements normalisation. To normalise, we >> perform a selective serialisation of the query tree, which is hashed. > > That seems like an interesting approach, and definitely a lot less of > a kluge than what Tomas suggested. Are you intending to hash the raw > grammar output tree, the parse analysis result, the rewriter result, > or the actual plan tree? I don't necessarily have a preconceived notion > about which is best (I can think of pluses and minuses for each), but > we should hear your explanation of which one you chose and what your > reasoning was. Could you describe the pluses and minuses? My understanding is that the later the hash is computed, the more it reflects how the queries were actually executed. Would it make sense to turn this into a GUC and leave the decision up to the user, something like pg_stat_statements.hash_phase = {grammar|analysis|rewriter|plan} So that the user could decide how coarse the output should be? > I'm not real sure whether it's better to classify on the basis of > similar plans or similar original queries, anyway. This seems like > something that could be open for debate about use-cases. Well, that's really tricky question - there can be different queries with the same plan. I thing that grouping queries solely by the plan is not much useful, so the original query should be involved somehow. What about using two hashes - hash of the grammar tree (grammar_hash) and hash of the rewriter output (rewriter_hash). The pg_stat_statements would then group the queries by the (grammar_hash, rewriter_hash) pair and include those two columns into the output. So I could select the rows with the same grammar_hash to see observed plans for the given query, or select rows with the same rewriter_hash to see queries leading to that particular plan. To make this actually usable it's important to provide access to the plans, so that the user can get rewriter_hash and get the plan somehow. This is not needed for grammar_hash, because the query string will be there, but the actual plan might change (due to autoanalyze etc.). But maybe this is a severe over-engineering and it's far too complicated. > It might be that the path of least resistance is to document that we > select one of the actual query strings "at random" to represent all the > queries that went into the same hash entry, and not even bother with > trying to strip out constants. The effort required to do that seems > well out of proportion to the reward, if we can't do a perfect job of > representing the common aspects of the queries. Yes, once we have the hashes we can surely use a random query string with the values included. But it'd be nice to have the actual plans stored somewhere, so that it's possible to see them later. Tomas
On 11/06/2011 01:08 PM, Tom Lane wrote: > Peter Geoghegan<peter@2ndquadrant.com> writes: > >> ... It also does things >> like intelligently distinguishing between queries with different >> limit/offset constant values, as these constants are deemed to be >> differentiators of queries for our purposes. A guiding principal that >> I've followed is that anything that could result in a different plan >> is a differentiator of queries. >> > This claim seems like bunk, unless you're hashing the plan tree, > in which case it's tautological. Peter's patch adds a planner hook and hashes at that level. Since this cat is rapidly escaping its bag and impacting other contributors, we might as well share the work in progress early if anyone has a burning desire to look at the code. A diff against the version I've been internally reviewing in prep for community submission is at https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm Easier to browse than ask questions probing about it I think. Apologies to Peter for sharing his work before he was completely ready; there is a list of known problems with it. I also regret the thread hijack here. The first chunk of code is a Python based regression test program, and an open item here is the best way to turn that into a standard regression test set. >> There will be additional infrastructure added to the parser to support >> normalisation of query strings for the patch I'll be submitting (that >> obviously won't be supported in the version that builds against >> existing Postgres versions that I'll make available). Essentially, >> I'll be adding a length field to certain nodes, >> > This seems like a good way to get your patch rejected: adding overhead > to the core system for the benefit of a feature that not everyone cares > about is problematic. Struggling around this area is the main reason this code hasn't been submitted yet. To step back for a moment, there are basically three options here that any code like this can take, in regards to storing the processed query name used as the key: 1) Use the first instance of that query seen as the "reference" version 2) Use the last instance seen 3) Transform the text of the query in a way that's stable across all duplicates of that statement, and output that Downstream tools operating on this data, things that will sample pg_stat_statements multiple times, need some sort of stable query text they can operate on. It really needs to survive server restart too. Neither (1) nor (2) seem like usable solutions. Both have been implemented already in Peter's patch, with (2) being what's in the current patch. How best to do (3) instead is not obvious though. Doing the query matching operating at the planner level seems effective at side-stepping the problem of needing to parse the SQL, which is where most implementations of this idea get stuck doing fragile transformations. My own first try at this back in September and Tomas's patch both fall into that category. That part of Peter's work seems to work as expected. That still leaves the problem of "parsed query -> stable normalized string". I think that is an easier one to solve than directly taking on the entirety of "query text -> stable normalized string" though. Peter has an idea he's exploring for how to implement that, and any amount of overhead it adds to people who don't use this feature is an obvious concern. There are certainly use cases that don't care about this problem, ones that would happily take (1) or (2). I'd rather not ship yet another not quite right for everyone version of pg_stat_statements again though; only solving too limited of a use case is the big problem with the one that's already there. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 6 November 2011 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Are you intending to hash the raw > grammar output tree, the parse analysis result, the rewriter result, > or the actual plan tree? I'm hashing the Query tree from a planner plugin (function assigned to planner_hook), immediately prior to it being passed to standard_planner. A major consideration was backwards compatibility; at one point, it looked like this all could be done without adding any new infrastructure, with perhaps an officially sanctioned 9.2 version of pg_stat_statements that could be built against earlier pg versions.Other than that, it seems intuitively obvious thatthis should happen as late as possible in the parsing stage - any transformation performed prior to planning was considered essential to the query, even if that potentially meant that successive calls of the same query string were considered different due to external factors. These things probably don't come up to often in practice, but I think that they're nice to have, as they prevent the DBA from looking at statistics that aren't actually representative. >> A guiding principal that >> I've followed is that anything that could result in a different plan >> is a differentiator of queries. > > This claim seems like bunk, unless you're hashing the plan tree, > in which case it's tautological. I think I may have been unclear, for which I apologise. Certainly, if the principle I (mis)described was followed to the letter, I'd end up with something that was exactly the same as what we already have. I merely meant to suggest, with an awareness of the fact that I was saying something slightly controversial, that because the planner is /invariably/ sensitive to changes in limit/offset constants, particularly large changes, it made sense to have those constants as differentiators, and it certainly made sense to have a limit n differentiate the same query with and without a limit n. I do of course appreciate that the use of a different constant in quals could result in a different plan being generated due to their selectivity estimates varying. > I'm not real sure whether it's better to classify on the basis of > similar plans or similar original queries, anyway. This seems like > something that could be open for debate about use-cases. Indeed - it's generally difficult to reason about what behaviour is optimal when attempting to anticipate the needs of every Postgres DBA. In this case, I myself lean strongly towards similar original queries, because classifying on the basis of similar plans isn't likely to be nearly as actionable, and there are really no obvious precedents to draw on - how can it be turned into a useful tool? >> There will be additional infrastructure added to the parser to support >> normalisation of query strings for the patch I'll be submitting (that >> obviously won't be supported in the version that builds against >> existing Postgres versions that I'll make available). Essentially, >> I'll be adding a length field to certain nodes, > > This seems like a good way to get your patch rejected: adding overhead > to the core system for the benefit of a feature that not everyone cares > about is problematic. It's problematic, but I believe that it can be justified. Without being glib, exactly no one cares about the location of tokens that correspond to Const nodes until they have an error that occurs outside the parser that is attributable to a node/corresponding token, which, in a production environment, could take a long time. All I'm doing is moving slightly more towards the default Bison representation of YYLTYPE, where the column and row of both the beginning and end of each token are stored. I'm storing location and length rather than just location, which is a modest change. Not everyone cares about this feature, but plenty do. It will be particularly useful to have the Query representation stable, even to the point where it is stable between pg_stat_statements_reset() calls - third party tools can rely on the stability of the query string. > Why do you need it anyway? Surely it's possible > to determine the length of a literal token after the fact. Probably, but not in a manner that I deem to be well-principled. I want to push the onus on keeping bookkeeping for the replacement of tokens into the parser, which is authoritative - otherwise, I'll end up with a kludge that is liable to fall out of sync. The community will have the opportunity to consider if that trade-off makes sense. > More generally, if you're hashing anything later than the raw grammar > tree, I think that generating a suitable representation of the queries > represented by a single hash entry is going to be problematic anyway. > There could be significant differences --- much more than just the > values of constants --- between query strings that end up being > semantically the same. Or for that matter we could have identical query > strings that wind up being considered different because of the action of > search_path or other context. I don't see that that has to be problematic. I consider that to be a feature, and it can be documented as such. I think that any scenario anyone might care to describe where this is a real problem will be mostly contrived. > It might be that the path of least resistance is to document that we > select one of the actual query strings "at random" to represent all the > queries that went into the same hash entry, and not even bother with > trying to strip out constants. The effort required to do that seems > well out of proportion to the reward, if we can't do a perfect job of > representing the common aspects of the queries. Up until quite recently, this patch actually updated the query string, so that a new set of constants was seen after each query call - that's what Greg posted a link to. The part of the patch that normalises the query string and the related backend infrastructure are fairly neat adjuncts to what you see there. So while I'll be asking someone to commit the patch with that infrastructure, because the adjunct adds quite a lot of value to the patch, there is no need to have that be a blocker. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 6 November 2011 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Are you intending to hash the raw >> grammar output tree, the parse analysis result, the rewriter result, >> or the actual plan tree? > I'm hashing the Query tree from a planner plugin (function assigned to > planner_hook), immediately prior to it being passed to > standard_planner. IOW, the rewriter result. Why that choice? That seems like about the least useful of the four possibilities, since it provides no certainty about the plan while also being as far from the original text as you can get (thus making the problem of what to display pretty hard). > A major consideration was backwards compatibility; This is not a consideration that the community is likely to weigh heavily, or indeed at all. We aren't going to back-port this feature into prior release branches, and we aren't going to want to contort its definition to make that easier. If 2ndquadrant wants to ship a back-branch version with the feature, you can certainly also back-port a patch that adds a hook where you need it, if you need a new hook. But frankly I'm a bit surprised that you're not hashing the query plan, since you could get that in the ExecutorStart hook that pg_stat_statements already uses. With a hook placed somewhere else, you add additional implementation problems of matching up the calls to that hook with later calls to the executor hook. >> I'm not real sure whether it's better to classify on the basis of >> similar plans or similar original queries, anyway. This seems like >> something that could be open for debate about use-cases. > Indeed - it's generally difficult to reason about what behaviour is > optimal when attempting to anticipate the needs of every Postgres DBA. > In this case, I myself lean strongly towards similar original queries, > because classifying on the basis of similar plans isn't likely to be > nearly as actionable, and there are really no obvious precedents to > draw on - how can it be turned into a useful tool? Hm, well, if that's what you think, why so late in the sequence? Hashing the raw grammar output would be the best way to identify similar original queries, ISTM. You'd also have a better chance of getting people to hold still for the extra overhead fields, if they didn't need to propagate further than that representation. Basically, I think there are fairly clear arguments for using a hash of the raw grammar output, if you lean towards the "classify by original query" viewpoint; or a hash of the plan tree, if you lean towards the "classify by plan" viewpoint. I don't see what use-cases would make it optimal to hash one of the intermediate stages. I was hoping you had an argument stronger than "it'll be easy to back-port" for that. regards, tom lane
On Sun, Nov 6, 2011 at 2:56 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Peter's patch adds a planner hook and hashes at that level. Since this cat > is rapidly escaping its bag and impacting other contributors, we might as > well share the work in progress early if anyone has a burning desire to look > at the code. A diff against the version I've been internally reviewing in > prep for community submission is at > https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm > Easier to browse than ask questions probing about it I think. Apologies to > Peter for sharing his work before he was completely ready; there is a list > of known problems with it. I also regret the thread hijack here. I think it's an established principle that the design for features like this should, for best results, be discussed on -hackers before writing a lot of code. That not only avoids duplication of effort (which is nice), but also allows design decisions like "what exactly should we hash and where should the hooks be?" to be ironed out early, before they can force a major rewrite. Of course, there's no hard requirement, but the archives are littered with patches that crashed and burned because the author wrote them in isolation and then, upon receiving feedback from the community that necessitated a full rewrite, gave up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/07/2011 09:03 AM, Robert Haas wrote: > I think it's an established principle that the design for features > like this should, for best results, be discussed on -hackers before > writing a lot of code. You can see from the commit history this idea is less than a month old. Do we need to get community approval before writing a proof of concept now? Everyone would still be arguing about how to get started had that path been taken. If feedback says this needs a full rewrite, fine. We are familiar with how submitting patches works here.
On 11/06/2011 06:00 PM, Tom Lane wrote: > Peter Geoghegan<peter@2ndquadrant.com> writes: > > > A major consideration was backwards compatibility; > This is not a consideration that the community is likely to weigh > heavily, or indeed at all. We aren't going to back-port this feature > into prior release branches, and we aren't going to want to contort its > definition to make that easier. Being able to ship a better pg_stat_statements that can run against earlier versions as an extension has significant value to the community. Needing to parse log files to do statement profiling is a major advocacy issue for PostgreSQL. If we can get something useful that's possible to test as an extension earlier than the 9.2 release--and make it available to more people running older versions too--that has some real value to users, and for early production testing of what will go into 9.2. The point where this crosses over to requiring server-side code to operate at all is obviously a deal breaker on that idea. So far that line hasn't been crossed, and we're trying to stage testing against older versions on real-world queries. As long as it's possible to keep that goal without making the code more difficult to deal with, I wouldn't dismiss that as a useless distinction.
On Mon, Nov 7, 2011 at 10:27 AM, Greg Smith <greg@2ndquadrant.com> wrote: > On 11/07/2011 09:03 AM, Robert Haas wrote: >> >> I think it's an established principle that the design for features >> like this should, for best results, be discussed on -hackers before >> writing a lot of code. > > You can see from the commit history this idea is less than a month old. Do > we need to get community approval before writing a proof of concept now? > Everyone would still be arguing about how to get started had that path been > taken. If feedback says this needs a full rewrite, fine. We are familiar > with how submitting patches works here. Eh, obviously that didn't come off the way I meant it, since I already got one other off-list reply suggesting that my tone there may not have been the best. Sorry. I suppose in a way I am taking myself to task as much as anyone, since I have been spending a lot of time thinking about things lately, and I haven't been as good about converting those ideas to a form suitable for posting on -hackers, and actually doing it, as I historically have been, and I'm feeling like I need to get back to doing that more. But I honestly don't care one way or the other how you or Peter develop your patches, beyond the fact that they contain a lot of good ideas which I would like to see work their way into the tree; and of course I do know that you are already familiar with the process. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 5, 2011 at 8:42 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Dne 6.11.2011 03:16, Peter Geoghegan napsal(a): >> 2011/11/6 Tomas Vondra <tv@fuzzy.cz>: >>> Hi everyone, >> >>> The patch implements a simple "cleaning" that replaces the parameter >>> values with generic strings - e.g. numbers are turned to ":n", so the >>> queries mentioned above are turned to >>> >>> SELECT abalance FROM pgbench_accounts WHERE aid = :n >>> >>> and thus tracked as a single query in pg_stat_statements. >> >> I'm a couple of days away from posting a much better principled >> implementation of pg_stat_statements normalisation. To normalise, we >> perform a selective serialisation of the query tree, which is hashed. > > OK, my patch definitely is not the only possible and if there's a way to > get more info from the planner, the results surely might be better. My > goal was to provide a simple patch that solves the problem better then > I'll be more than happy to remove mine. Hi Tomas, Given Peter's patch on the same subject, should we now mark this one as rejected in the commitfest app? https://commitfest.postgresql.org/action/patch_view?id=681 Thanks, Jeff
On 11/25/2011 10:52 AM, Jeff Janes wrote: > Given Peter's patch on the same subject, should we now mark this one > as rejected in the commitfest app? > https://commitfest.postgresql.org/action/patch_view?id=681 That may be premature. While the testing I've done so far suggests Peter's idea is the better route forward, the much higher complexity level does mean it's surely possible for it to be rejected. We have a reviewer signed up for Peter's submission now; I think what to do with Tomas's entry will be an easier to make once that report is in. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us