Re: queryId constant squashing does not support prepared statements - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: queryId constant squashing does not support prepared statements
Date
Msg-id tkor7ouodb4i6popbxx6bvb5kxfwbued3gdknxroqjnnlaot5f@55yynedl6ah6
Whole thread Raw
In response to Re: queryId constant squashing does not support prepared statements  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
> On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
> > On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
> >
> > I agree that the current solution we have in the tree feels incomplete
> > because we are not taking into account the most common cases that
> > users would care about.  Now, allowing PARAM_EXTERN means that we
> > allow any expression to be detected as a squashable thing, and this
> > kinds of breaks the assumption of IsSquashableConst() where we want
> > only constants to be allowed because EXECUTE parameters can be any
> > kind of Expr nodes.  At least that's the intention of the code on
> > HEAD.
> >
> > Now, I am not actually objecting about PARAM_EXTERN included or not if
> > there's a consensus behind it and my arguments are considered as not
> > relevant.  The patch is written so as it claims that a PARAM_EXTERN
> > implies the expression to be a Const, but it may not be so depending
> > on what the execution path is given for the parameter.  Or at least
> > the patch could be clearer and rename the parts about the "Const"
> > squashable APIs around queryjumblefuncs.c.
> >
> > [...]
> >
> > The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
> > btw:
> > https://www.postgresql.org/message-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com
>
> In fact, this has been discussed much earlier in the thread above, as
> essentially the same implementation with T_Params, which is submitted
> here, was part of the original patch. The concern was always whether or
> not it will break any assumption about query identification, because
> this way much broader scope of expressions will be considered equivalent
> for query id computation purposes.
>
> At the same time after thinking about this concern more, I presume it
> already happens at a smaller scale -- when two queries happen to have
> the same number of parameters, they will be indistinguishable even if
> parameters are different in some way.

Returning to the topic of whether to squash list of Params.

Originally squashing of Params wasn't included into the squashing patch due to
concerns from reviewers about treating quite different queries as the same for
the purposes of query identification. E.g. there is some assumption somewhere,
which will be broken if we treat query with a list of integer parameters same
as a query with a list of float parameters. For the sake of making progress
I've decided to postpone answering this question and concentrate on more simple
scenario. Now, as the patch was applied, I think it's a good moment to reflect
on those concerns. It's not enough to say that we don't see any problems with
squashing of Param, some more sound argumentation is needed. So, what will
happen if parameters are squashed as constants?

1. One obvious impact is that more queries, that were considered distinct
before, will have the same normalized query and hence the entry in
pg_stat_statements. Since a Param could be pretty much anything, this can lead
to a situation when two queries with quiet different performance profiles (e.g.
one contains a parameter value, which is a heavy function, another one doesn't)
are matched to one entry, making it less useful.

But at the same time this already can happen if those two queries have the same
number of parameters, since query parametrizing is intrinsically lossy in this
sense. The only thing we do by squashing such queries is we loose information
about the number of parameters, not properties of the parameters themselves.

2. Another tricky scenario is when queryId is used by some extension, which in
turn makes assumption about it that are going to be rendered incorrect by
squashing. The only type of assumptions I can imagine falling into this
category is anything about equivalence of queries. For example, an extension
can capture two queries, which have the same normalized entry in pgss, and
assume all properties of those queries are the same.

It's worth noting that normalized query is not transitive, i.e. if a query1 has
the normalized version query_norm, and a query2 has the same normalized version
query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
they could have list of parameter values with different type and the same
size). That means that such assumptions are already faulty, and could work most
of the time only because it takes queries with a list of the same size to break
the assumption. Squashing such queries will make them wrong more often.

One can argue that we might want to be friendly to such extensions, and do not
"break" them even further. But I don't think it's worth it, as number of such
extensions is most likely low, if any. One more extreme case would be when an
extension assumes that queries with the same entry in pgss have the same number
of parameters, but I don't see how such assumption could be useful.

3. More annoying is the consequence that parameters are going to be treated as
constants in pg_stat_statements. While mostly harmless, that would mean they're
going to be replaced in the same way as constants. This means that the
parameter order is going to be lost, e.g.:

SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)

SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
    AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
-- output
SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
    AND id IN ($2 /*, ... */)

This representation could be confusing of course. It could be either explained
in the documentation, or LocationLen has to be extended to carry information
about whether it's a constant or a parameter, and do not replace the latter. In
any case, anything more than the first parameter number will be lost, but it's
probably not so dramatic.

At the end of the day, I think the value of squashing for parameters outweighs
the problems described above. As long as there is an agreement about that, it's
fine by me. I've attached the more complete version of the patch (but without
modifying LocationLen to not replace Param yet) in case if such agreemeng will
be achieved.

Attachment

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: PG 18 release notes draft committed
Next
From: Álvaro Herrera
Date:
Subject: Re: PG 18 release notes draft committed