Re: queryId constant squashing does not support prepared statements - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: queryId constant squashing does not support prepared statements |
Date | |
Msg-id | tkor7ouodb4i6popbxx6bvb5kxfwbued3gdknxroqjnnlaot5f@55yynedl6ah6 Whole thread Raw |
In response to | Re: queryId constant squashing does not support prepared statements (Dmitry Dolgov <9erthalion6@gmail.com>) |
List | pgsql-hackers |
> On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote: > > On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote: > > > > I agree that the current solution we have in the tree feels incomplete > > because we are not taking into account the most common cases that > > users would care about. Now, allowing PARAM_EXTERN means that we > > allow any expression to be detected as a squashable thing, and this > > kinds of breaks the assumption of IsSquashableConst() where we want > > only constants to be allowed because EXECUTE parameters can be any > > kind of Expr nodes. At least that's the intention of the code on > > HEAD. > > > > Now, I am not actually objecting about PARAM_EXTERN included or not if > > there's a consensus behind it and my arguments are considered as not > > relevant. The patch is written so as it claims that a PARAM_EXTERN > > implies the expression to be a Const, but it may not be so depending > > on what the execution path is given for the parameter. Or at least > > the patch could be clearer and rename the parts about the "Const" > > squashable APIs around queryjumblefuncs.c. > > > > [...] > > > > The PARAM_EXTERN part has been mentioned a couple of weeks ago here, > > btw: > > https://www.postgresql.org/message-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=VpEA@mail.gmail.com > > In fact, this has been discussed much earlier in the thread above, as > essentially the same implementation with T_Params, which is submitted > here, was part of the original patch. The concern was always whether or > not it will break any assumption about query identification, because > this way much broader scope of expressions will be considered equivalent > for query id computation purposes. > > At the same time after thinking about this concern more, I presume it > already happens at a smaller scale -- when two queries happen to have > the same number of parameters, they will be indistinguishable even if > parameters are different in some way. Returning to the topic of whether to squash list of Params. Originally squashing of Params wasn't included into the squashing patch due to concerns from reviewers about treating quite different queries as the same for the purposes of query identification. E.g. there is some assumption somewhere, which will be broken if we treat query with a list of integer parameters same as a query with a list of float parameters. For the sake of making progress I've decided to postpone answering this question and concentrate on more simple scenario. Now, as the patch was applied, I think it's a good moment to reflect on those concerns. It's not enough to say that we don't see any problems with squashing of Param, some more sound argumentation is needed. So, what will happen if parameters are squashed as constants? 1. One obvious impact is that more queries, that were considered distinct before, will have the same normalized query and hence the entry in pg_stat_statements. Since a Param could be pretty much anything, this can lead to a situation when two queries with quiet different performance profiles (e.g. one contains a parameter value, which is a heavy function, another one doesn't) are matched to one entry, making it less useful. But at the same time this already can happen if those two queries have the same number of parameters, since query parametrizing is intrinsically lossy in this sense. The only thing we do by squashing such queries is we loose information about the number of parameters, not properties of the parameters themselves. 2. Another tricky scenario is when queryId is used by some extension, which in turn makes assumption about it that are going to be rendered incorrect by squashing. The only type of assumptions I can imagine falling into this category is anything about equivalence of queries. For example, an extension can capture two queries, which have the same normalized entry in pgss, and assume all properties of those queries are the same. It's worth noting that normalized query is not transitive, i.e. if a query1 has the normalized version query_norm, and a query2 has the same normalized version query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g. they could have list of parameter values with different type and the same size). That means that such assumptions are already faulty, and could work most of the time only because it takes queries with a list of the same size to break the assumption. Squashing such queries will make them wrong more often. One can argue that we might want to be friendly to such extensions, and do not "break" them even further. But I don't think it's worth it, as number of such extensions is most likely low, if any. One more extreme case would be when an extension assumes that queries with the same entry in pgss have the same number of parameters, but I don't see how such assumption could be useful. 3. More annoying is the consequence that parameters are going to be treated as constants in pg_stat_statements. While mostly harmless, that would mean they're going to be replaced in the same way as constants. This means that the parameter order is going to be lost, e.g.: SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4 -- output SELECT * FROM test_squash WHERE data IN ($1 /*, ... */) SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4) AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8 -- output SELECT * FROM test_squash WHERE data IN ($1 /*, ... */) AND id IN ($2 /*, ... */) This representation could be confusing of course. It could be either explained in the documentation, or LocationLen has to be extended to carry information about whether it's a constant or a parameter, and do not replace the latter. In any case, anything more than the first parameter number will be lost, but it's probably not so dramatic. At the end of the day, I think the value of squashing for parameters outweighs the problems described above. As long as there is an agreement about that, it's fine by me. I've attached the more complete version of the patch (but without modifying LocationLen to not replace Param yet) in case if such agreemeng will be achieved.
Attachment
pgsql-hackers by date: