Thread: [HACKERS] Preserving param location

[HACKERS] Preserving param location

From
Julien Rouhaud
Date:
Hello,

When a query contains parameters, the original param node contains the token
location.  However, this information is lost when the Const node is generated,
this one will only contain position -1 (unknown).

FWIW, we do have a use case for this (custom extension that tracks quals
statistics, which among other thing is used to regenerate query string from a
pgss normalized query, thus needing the original param location).

Is this something we want to get fixed? If yes, attached is a simple patch for
that.

Regards.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Preserving param location

From
Julien Rouhaud
Date:
On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote:
> 
> When a query contains parameters, the original param node contains the token
> location.  However, this information is lost when the Const node is generated,
> this one will only contain position -1 (unknown).
>
> FWIW, we do have a use case for this (custom extension that tracks quals
> statistics, which among other thing is used to regenerate query string from a
> pgss normalized query, thus needing the original param location).

rebased v2.



Re: [HACKERS] Preserving param location

From
Julien Rouhaud
Date:
On Sun, Jun 27, 2021 at 11:31:53AM +0800, Julien Rouhaud wrote:
> On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote:
> > 
> > When a query contains parameters, the original param node contains the token
> > location.  However, this information is lost when the Const node is generated,
> > this one will only contain position -1 (unknown).
> >
> > FWIW, we do have a use case for this (custom extension that tracks quals
> > statistics, which among other thing is used to regenerate query string from a
> > pgss normalized query, thus needing the original param location).
> 
> rebased v2.

This time with the patch.

Attachment

Re: [HACKERS] Preserving param location

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sun, Jun 27, 2021 at 11:31:53AM +0800, Julien Rouhaud wrote:
>> On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote:
>>> When a query contains parameters, the original param node contains the token
>>> location.  However, this information is lost when the Const node is generated,
>>> this one will only contain position -1 (unknown).
>>>
>>> FWIW, we do have a use case for this (custom extension that tracks quals
>>> statistics, which among other thing is used to regenerate query string from a
>>> pgss normalized query, thus needing the original param location).

I looked at this for a bit.  It's certainly a simple and
harmless-looking patch, but I'm having a hard time buying that it's
actually useful in isolation.  In particular, even if we keep the
Param's location in the immediately-resulting Const, what happens
when that node is further mashed by planner transformations?
As an example, given

regression=# prepare foo(int) as select $1 + 1;
PREPARE
regression=# explain verbose execute foo(42);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 43
(2 rows)

this patch isn't gonna do anything to help you figure out where the
42 went.

I thought for a bit about also having evaluate_expr() inject the
exprLocation(expr) into its result Const, but in this example
that would likely result in having the 43 labeled with the offset
of the $1 (given exprLocation's penchant for preferring the leftmost
token location).  That could be more confusing not less, especially
when you consider that the variant case "1 + $1" would not be
traceable to the Param at all.

I also fear that this is still just the tip of the iceberg, in terms
of when rewriting and planning preserve useful info about the source
location of modified Nodes.  There's probably other places to touch
if we want to have consistent results.

So I'm not really convinced that there's a fully-baked use case
here, and would like more detail about how you hope to use the
location value.

            regards, tom lane



Re: [HACKERS] Preserving param location

From
Julien Rouhaud
Date:
On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>
> I thought for a bit about also having evaluate_expr() inject the
> exprLocation(expr) into its result Const, but in this example
> that would likely result in having the 43 labeled with the offset
> of the $1 (given exprLocation's penchant for preferring the leftmost
> token location).  That could be more confusing not less, especially
> when you consider that the variant case "1 + $1" would not be
> traceable to the Param at all.

Thanks for looking at this patch.

I agree that it probably doesn't solve all problems related to token location,
but arguably there can't be a perfect solution for all cases with our
infrastructure anyway.  Going a bit further the road:

=# PREPARE foo2(int, int) AS select $1 + $2;
=# EXPLAIN (VERBOSE) EXECUTE foo2(42, 42);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 84
(2 rows)

As long as we store a single location for each token, there's no way we can
give a correct answer there, and I'm definitely not pushing for having an
array of locations.

> I also fear that this is still just the tip of the iceberg, in terms
> of when rewriting and planning preserve useful info about the source
> location of modified Nodes.  There's probably other places to touch
> if we want to have consistent results.
> 
> So I'm not really convinced that there's a fully-baked use case
> here, and would like more detail about how you hope to use the
> location value.

As I said I have no doubt that there are other cases which are currently not
correctly handled, but that's not what I'm trying to fix.  I just want to treat
parameterized queries the same way as regular queries, thus fixing one obvious
and frequent case (for which some user did complain), hoping that it will on
its own already help many users.

As mentioned in my first email, the use case is trying to deparse a
normalized query.  I'm working on pg_qualstats extension, which tracks quals
and a lot of associated information.  For example, the constant value,
the constant location but also whether the qual was part of an index scan or
not, the effective selectivity, the selectivity estimation error and so on.

So here is one over-simplified example on how to use that, relying on the
discussed patch:

=# CREATE TABLE IF NOT EXISTS pgqs(id integer, val text);
CREATE TABLE

=# PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2;
PREPARE

=# EXECUTE foo1(42, 'test');
 id
----
(0 rows)

=# SELECT query,
CASE WHEN constant_position IS NOT NULL THEN
    substr(query, constant_position, 3)
ELSE NULL
END AS param,
constvalue, constant_position FROM pg_qualstats() pgqs
LEFT JOIN pg_stat_statements pgss USING (userid, dbid, queryid)
ORDER BY 4;
-[ RECORD 1 ]-----+--------------------------------------------------------------------------
query             | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2
param             |  $1
constvalue        | 42::integer
constant_position | 58
-[ RECORD 2 ]-----+--------------------------------------------------------------------------
query             | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2
param             |  $2
constvalue        | 'test'::text
constant_position | 71

which is enough to reliably regenerate the original statement.  With the
current code this is simply impossible.

The final goal is to try to help DBA to find problematic queries, and give
as much information as possible automatically.

One naive usage is to suggest indexes using the quals that are not part of an
index scan, if their selectivity is high enough.  In order to help validating
the resulting index suggestion, I can try to create hypothetical indexes and
test them using various sets of quals that are stored (the most filtering, the
least filtering, the most often used and such) to see if the indexes would be
used, and in which case.

For other use cases, just having the execution plans for the various sets of
quals can help to track down other problems.  For instance detecting if some
bad selectivity estimation for some qual leads to an inferior plan compared to
the same qual wihen it has a better selectivity estimate.

There are of course a lot of gotchas but if it can automatically provide
helpful information for 50% of the queries, that as much time saved that can be
spent actually investigating the performance issues rather than manually
copy/pasting queries and qual constants around.

In practice it doesn't work so bad, except when you're using extended protocol
or prepared statements.  In such cases, you drop from let's say 50% (wild
pessimistic guess) of the queries that can be preprocessed to a straight 0%.



Re: [HACKERS] Preserving param location

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>> So I'm not really convinced that there's a fully-baked use case
>> here, and would like more detail about how you hope to use the
>> location value.

> As I said I have no doubt that there are other cases which are currently not
> correctly handled, but that's not what I'm trying to fix.  I just want to treat
> parameterized queries the same way as regular queries, thus fixing one obvious
> and frequent case (for which some user did complain), hoping that it will on
> its own already help many users.

Hm.  I guess this point (i.e. that the Param substitution should result
in the identical plan tree as writing a literal constant would have)
has some force.  I still feel like your application is pretty shaky,
but I don't really have any ideas about feasible ways to make it better.

Will push the patch.

            regards, tom lane



Re: [HACKERS] Preserving param location

From
Julien Rouhaud
Date:
On Wed, Jul 14, 2021 at 02:02:18PM -0400, Tom Lane wrote:
> 
> Hm.  I guess this point (i.e. that the Param substitution should result
> in the identical plan tree as writing a literal constant would have)
> has some force.  I still feel like your application is pretty shaky,
> but I don't really have any ideas about feasible ways to make it better.

I totally agree that it's quite shaky, but I also don't have a better idea on
how to improve it.  In the meantime it already saved my former colleagues and I
(and I'm assuming other people) many hours of boring and tedious work.

> Will push the patch.

Thanks a lot!