Thread: [HACKERS] Preserving param location
Hello, When a query contains parameters, the original param node contains the token location. However, this information is lost when the Const node is generated, this one will only contain position -1 (unknown). FWIW, we do have a use case for this (custom extension that tracks quals statistics, which among other thing is used to regenerate query string from a pgss normalized query, thus needing the original param location). Is this something we want to get fixed? If yes, attached is a simple patch for that. Regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote: > > When a query contains parameters, the original param node contains the token > location. However, this information is lost when the Const node is generated, > this one will only contain position -1 (unknown). > > FWIW, we do have a use case for this (custom extension that tracks quals > statistics, which among other thing is used to regenerate query string from a > pgss normalized query, thus needing the original param location). rebased v2.
On Sun, Jun 27, 2021 at 11:31:53AM +0800, Julien Rouhaud wrote: > On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote: > > > > When a query contains parameters, the original param node contains the token > > location. However, this information is lost when the Const node is generated, > > this one will only contain position -1 (unknown). > > > > FWIW, we do have a use case for this (custom extension that tracks quals > > statistics, which among other thing is used to regenerate query string from a > > pgss normalized query, thus needing the original param location). > > rebased v2. This time with the patch.
Attachment
Julien Rouhaud <rjuju123@gmail.com> writes: > On Sun, Jun 27, 2021 at 11:31:53AM +0800, Julien Rouhaud wrote: >> On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote: >>> When a query contains parameters, the original param node contains the token >>> location. However, this information is lost when the Const node is generated, >>> this one will only contain position -1 (unknown). >>> >>> FWIW, we do have a use case for this (custom extension that tracks quals >>> statistics, which among other thing is used to regenerate query string from a >>> pgss normalized query, thus needing the original param location). I looked at this for a bit. It's certainly a simple and harmless-looking patch, but I'm having a hard time buying that it's actually useful in isolation. In particular, even if we keep the Param's location in the immediately-resulting Const, what happens when that node is further mashed by planner transformations? As an example, given regression=# prepare foo(int) as select $1 + 1; PREPARE regression=# explain verbose execute foo(42); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 43 (2 rows) this patch isn't gonna do anything to help you figure out where the 42 went. I thought for a bit about also having evaluate_expr() inject the exprLocation(expr) into its result Const, but in this example that would likely result in having the 43 labeled with the offset of the $1 (given exprLocation's penchant for preferring the leftmost token location). That could be more confusing not less, especially when you consider that the variant case "1 + $1" would not be traceable to the Param at all. I also fear that this is still just the tip of the iceberg, in terms of when rewriting and planning preserve useful info about the source location of modified Nodes. There's probably other places to touch if we want to have consistent results. So I'm not really convinced that there's a fully-baked use case here, and would like more detail about how you hope to use the location value. regards, tom lane
On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote: > > I thought for a bit about also having evaluate_expr() inject the > exprLocation(expr) into its result Const, but in this example > that would likely result in having the 43 labeled with the offset > of the $1 (given exprLocation's penchant for preferring the leftmost > token location). That could be more confusing not less, especially > when you consider that the variant case "1 + $1" would not be > traceable to the Param at all. Thanks for looking at this patch. I agree that it probably doesn't solve all problems related to token location, but arguably there can't be a perfect solution for all cases with our infrastructure anyway. Going a bit further the road: =# PREPARE foo2(int, int) AS select $1 + $2; =# EXPLAIN (VERBOSE) EXECUTE foo2(42, 42); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 84 (2 rows) As long as we store a single location for each token, there's no way we can give a correct answer there, and I'm definitely not pushing for having an array of locations. > I also fear that this is still just the tip of the iceberg, in terms > of when rewriting and planning preserve useful info about the source > location of modified Nodes. There's probably other places to touch > if we want to have consistent results. > > So I'm not really convinced that there's a fully-baked use case > here, and would like more detail about how you hope to use the > location value. As I said I have no doubt that there are other cases which are currently not correctly handled, but that's not what I'm trying to fix. I just want to treat parameterized queries the same way as regular queries, thus fixing one obvious and frequent case (for which some user did complain), hoping that it will on its own already help many users. As mentioned in my first email, the use case is trying to deparse a normalized query. I'm working on pg_qualstats extension, which tracks quals and a lot of associated information. For example, the constant value, the constant location but also whether the qual was part of an index scan or not, the effective selectivity, the selectivity estimation error and so on. So here is one over-simplified example on how to use that, relying on the discussed patch: =# CREATE TABLE IF NOT EXISTS pgqs(id integer, val text); CREATE TABLE =# PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2; PREPARE =# EXECUTE foo1(42, 'test'); id ---- (0 rows) =# SELECT query, CASE WHEN constant_position IS NOT NULL THEN substr(query, constant_position, 3) ELSE NULL END AS param, constvalue, constant_position FROM pg_qualstats() pgqs LEFT JOIN pg_stat_statements pgss USING (userid, dbid, queryid) ORDER BY 4; -[ RECORD 1 ]-----+-------------------------------------------------------------------------- query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2 param | $1 constvalue | 42::integer constant_position | 58 -[ RECORD 2 ]-----+-------------------------------------------------------------------------- query | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2 param | $2 constvalue | 'test'::text constant_position | 71 which is enough to reliably regenerate the original statement. With the current code this is simply impossible. The final goal is to try to help DBA to find problematic queries, and give as much information as possible automatically. One naive usage is to suggest indexes using the quals that are not part of an index scan, if their selectivity is high enough. In order to help validating the resulting index suggestion, I can try to create hypothetical indexes and test them using various sets of quals that are stored (the most filtering, the least filtering, the most often used and such) to see if the indexes would be used, and in which case. For other use cases, just having the execution plans for the various sets of quals can help to track down other problems. For instance detecting if some bad selectivity estimation for some qual leads to an inferior plan compared to the same qual wihen it has a better selectivity estimate. There are of course a lot of gotchas but if it can automatically provide helpful information for 50% of the queries, that as much time saved that can be spent actually investigating the performance issues rather than manually copy/pasting queries and qual constants around. In practice it doesn't work so bad, except when you're using extended protocol or prepared statements. In such cases, you drop from let's say 50% (wild pessimistic guess) of the queries that can be preprocessed to a straight 0%.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote: >> So I'm not really convinced that there's a fully-baked use case >> here, and would like more detail about how you hope to use the >> location value. > As I said I have no doubt that there are other cases which are currently not > correctly handled, but that's not what I'm trying to fix. I just want to treat > parameterized queries the same way as regular queries, thus fixing one obvious > and frequent case (for which some user did complain), hoping that it will on > its own already help many users. Hm. I guess this point (i.e. that the Param substitution should result in the identical plan tree as writing a literal constant would have) has some force. I still feel like your application is pretty shaky, but I don't really have any ideas about feasible ways to make it better. Will push the patch. regards, tom lane
On Wed, Jul 14, 2021 at 02:02:18PM -0400, Tom Lane wrote: > > Hm. I guess this point (i.e. that the Param substitution should result > in the identical plan tree as writing a literal constant would have) > has some force. I still feel like your application is pretty shaky, > but I don't really have any ideas about feasible ways to make it better. I totally agree that it's quite shaky, but I also don't have a better idea on how to improve it. In the meantime it already saved my former colleagues and I (and I'm assuming other people) many hours of boring and tedious work. > Will push the patch. Thanks a lot!