Re: Multiple Query IDs for a rewritten parse tree - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Multiple Query IDs for a rewritten parse tree |
Date | |
Msg-id | Ydu7a08KPZKz4sUC@jrouhaud Whole thread Raw |
In response to | Re: Multiple Query IDs for a rewritten parse tree ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Multiple Query IDs for a rewritten parse tree
|
List | pgsql-hackers |
On Mon, Jan 10, 2022 at 09:10:59AM +0500, Andrey V. Lepikhov wrote: > On 1/9/22 5:13 PM, Julien Rouhaud wrote: > > For now the queryid mixes two different things: fingerprinting and query text > > normalization. Should each calculation method be allowed to do a different > > normalization too, and if yes where should be stored the state data needed for > > that? If not, we would need some kind of primary hash for that purpose. > > > Do You mean JumbleState? Yes, or some other abstraction. For instance with the proposed patch to handle differently the IN clauses, you needs a to change JumbleState. > I think, registering queryId generator we should store also a pointer (void > **args) to an additional data entry, as usual. Well, yes but only if you need to produce different versions of the normalized query text, and I'm not convinced that's really necessary. > > Looking at Andrey's use case for wanting multiple hashes, I don't think that > > adaptive optimization needs a normalized query string. The only use would be > > to output some statistics, but this could be achieved by storing a list of > > "primary queryid" for each adaptive entry. That's probably also true for > > anything that's not monitoring intended. Also, all monitoring consumers should > > probably agree on the same queryid, both fingerprint and normalized string, as > > otherwise it's impossible to cross-reference metric data. > > > I can add one more use case. > Our extension for freezing query plan uses query tree comparison technique > to prove, that the plan can be applied (and we don't need to execute > planning procedure at all). > The procedure of a tree equality checking is expensive and we use cheaper > queryId comparison to identify possible candidates. So here, for the better > performance and queries coverage, we need to use query tree normalization - > queryId should be stable to some modifications in a query text which do not > change semantics. > As an example, query plan with external parameters can be used to execute > constant query if these constants correspond by place and type to the > parameters. So, queryId calculation technique returns also pointers to all > constants and parameters found during the calculation. I'm also working on a similar extension, and yes you can't accept any fingerprinting approach for that. I don't know what are the exact heuristics of your cheaper queryid calculation are, but is it reasonable to use it with something like pg_stat_statements? If yes, you don't really need two queryid approach for the sake of this single extension and therefore don't need to store multiple jumble state or similar per statement. Especially since requiring another one would mean a performance drop as soon as you want to use something as common as pg_stat_statements.
pgsql-hackers by date: