Re: [HACKERS] Preserving param location - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: [HACKERS] Preserving param location |
Date | |
Msg-id | 20210714094645.moaxeqokvnemr2zj@nol Whole thread Raw |
In response to | Re: [HACKERS] Preserving param location (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] Preserving param location
|
List | pgsql-hackers |
On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote: > > I thought for a bit about also having evaluate_expr() inject the > exprLocation(expr) into its result Const, but in this example > that would likely result in having the 43 labeled with the offset > of the $1 (given exprLocation's penchant for preferring the leftmost > token location). That could be more confusing not less, especially > when you consider that the variant case "1 + $1" would not be > traceable to the Param at all. Thanks for looking at this patch. I agree that it probably doesn't solve all problems related to token location, but arguably there can't be a perfect solution for all cases with our infrastructure anyway. Going a bit further the road: =# PREPARE foo2(int, int) AS select $1 + $2; =# EXPLAIN (VERBOSE) EXECUTE foo2(42, 42); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 84 (2 rows) As long as we store a single location for each token, there's no way we can give a correct answer there, and I'm definitely not pushing for having an array of locations. > I also fear that this is still just the tip of the iceberg, in terms > of when rewriting and planning preserve useful info about the source > location of modified Nodes. There's probably other places to touch > if we want to have consistent results. > > So I'm not really convinced that there's a fully-baked use case > here, and would like more detail about how you hope to use the > location value. As I said I have no doubt that there are other cases which are currently not correctly handled, but that's not what I'm trying to fix. I just want to treat parameterized queries the same way as regular queries, thus fixing one obvious and frequent case (for which some user did complain), hoping that it will on its own already help many users. As mentioned in my first email, the use case is trying to deparse a normalized query. I'm working on pg_qualstats extension, which tracks quals and a lot of associated information. For example, the constant value, the constant location but also whether the qual was part of an index scan or not, the effective selectivity, the selectivity estimation error and so on. So here is one over-simplified example on how to use that, relying on the discussed patch: =# CREATE TABLE IF NOT EXISTS pgqs(id integer, val text); CREATE TABLE =# PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2; PREPARE =# EXECUTE foo1(42, 'test'); id ---- (0 rows) =# SELECT query, CASE WHEN constant_position IS NOT NULL THEN substr(query, constant_position, 3) ELSE NULL END AS param, constvalue, constant_position FROM pg_qualstats() pgqs LEFT JOIN pg_stat_statements pgss USING (userid, dbid, queryid) ORDER BY 4; -[ RECORD 1 ]-----+-------------------------------------------------------------------------- query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2 param | $1 constvalue | 42::integer constant_position | 58 -[ RECORD 2 ]-----+-------------------------------------------------------------------------- query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2 param | $2 constvalue | 'test'::text constant_position | 71 which is enough to reliably regenerate the original statement. With the current code this is simply impossible. The final goal is to try to help DBA to find problematic queries, and give as much information as possible automatically. One naive usage is to suggest indexes using the quals that are not part of an index scan, if their selectivity is high enough. In order to help validating the resulting index suggestion, I can try to create hypothetical indexes and test them using various sets of quals that are stored (the most filtering, the least filtering, the most often used and such) to see if the indexes would be used, and in which case. For other use cases, just having the execution plans for the various sets of quals can help to track down other problems. For instance detecting if some bad selectivity estimation for some qual leads to an inferior plan compared to the same qual wihen it has a better selectivity estimate. There are of course a lot of gotchas but if it can automatically provide helpful information for 50% of the queries, that as much time saved that can be spent actually investigating the performance issues rather than manually copy/pasting queries and qual constants around. In practice it doesn't work so bad, except when you're using extended protocol or prepared statements. In such cases, you drop from let's say 50% (wild pessimistic guess) of the queries that can be preprocessed to a straight 0%.
pgsql-hackers by date: