Re: [HACKERS] Preserving param location - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: [HACKERS] Preserving param location
Date
Msg-id 20210714094645.moaxeqokvnemr2zj@nol
Whole thread Raw
In response to Re: [HACKERS] Preserving param location  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Preserving param location  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>
> I thought for a bit about also having evaluate_expr() inject the
> exprLocation(expr) into its result Const, but in this example
> that would likely result in having the 43 labeled with the offset
> of the $1 (given exprLocation's penchant for preferring the leftmost
> token location).  That could be more confusing not less, especially
> when you consider that the variant case "1 + $1" would not be
> traceable to the Param at all.

Thanks for looking at this patch.

I agree that it probably doesn't solve all problems related to token location,
but arguably there can't be a perfect solution for all cases with our
infrastructure anyway.  Going a bit further the road:

=# PREPARE foo2(int, int) AS select $1 + $2;
=# EXPLAIN (VERBOSE) EXECUTE foo2(42, 42);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 84
(2 rows)

As long as we store a single location for each token, there's no way we can
give a correct answer there, and I'm definitely not pushing for having an
array of locations.

> I also fear that this is still just the tip of the iceberg, in terms
> of when rewriting and planning preserve useful info about the source
> location of modified Nodes.  There's probably other places to touch
> if we want to have consistent results.
> 
> So I'm not really convinced that there's a fully-baked use case
> here, and would like more detail about how you hope to use the
> location value.

As I said I have no doubt that there are other cases which are currently not
correctly handled, but that's not what I'm trying to fix.  I just want to treat
parameterized queries the same way as regular queries, thus fixing one obvious
and frequent case (for which some user did complain), hoping that it will on
its own already help many users.

As mentioned in my first email, the use case is trying to deparse a
normalized query.  I'm working on pg_qualstats extension, which tracks quals
and a lot of associated information.  For example, the constant value,
the constant location but also whether the qual was part of an index scan or
not, the effective selectivity, the selectivity estimation error and so on.

So here is one over-simplified example on how to use that, relying on the
discussed patch:

=# CREATE TABLE IF NOT EXISTS pgqs(id integer, val text);
CREATE TABLE

=# PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2;
PREPARE

=# EXECUTE foo1(42, 'test');
 id
----
(0 rows)

=# SELECT query,
CASE WHEN constant_position IS NOT NULL THEN
    substr(query, constant_position, 3)
ELSE NULL
END AS param,
constvalue, constant_position FROM pg_qualstats() pgqs
LEFT JOIN pg_stat_statements pgss USING (userid, dbid, queryid)
ORDER BY 4;
-[ RECORD 1 ]-----+--------------------------------------------------------------------------
query             | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2
param             |  $1
constvalue        | 42::integer
constant_position | 58
-[ RECORD 2 ]-----+--------------------------------------------------------------------------
query             | PREPARE foo1(int, text) AS SELECT * FROM pgqs WHERE id >= $1 AND val = $2
param             |  $2
constvalue        | 'test'::text
constant_position | 71

which is enough to reliably regenerate the original statement.  With the
current code this is simply impossible.

The final goal is to try to help DBA to find problematic queries, and give
as much information as possible automatically.

One naive usage is to suggest indexes using the quals that are not part of an
index scan, if their selectivity is high enough.  In order to help validating
the resulting index suggestion, I can try to create hypothetical indexes and
test them using various sets of quals that are stored (the most filtering, the
least filtering, the most often used and such) to see if the indexes would be
used, and in which case.

For other use cases, just having the execution plans for the various sets of
quals can help to track down other problems.  For instance detecting if some
bad selectivity estimation for some qual leads to an inferior plan compared to
the same qual wihen it has a better selectivity estimate.

There are of course a lot of gotchas but if it can automatically provide
helpful information for 50% of the queries, that as much time saved that can be
spent actually investigating the performance issues rather than manually
copy/pasting queries and qual constants around.

In practice it doesn't work so bad, except when you're using extended protocol
or prepared statements.  In such cases, you drop from let's say 50% (wild
pessimistic guess) of the queries that can be preprocessed to a straight 0%.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication
Next
From: Tomas Vondra
Date:
Subject: Re: proposal: possibility to read dumped table's name from file