Re: Multiple Query IDs for a rewritten parse tree - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Multiple Query IDs for a rewritten parse tree
Date
Msg-id Ydu7a08KPZKz4sUC@jrouhaud
Whole thread Raw
In response to Re: Multiple Query IDs for a rewritten parse tree  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
Responses Re: Multiple Query IDs for a rewritten parse tree
List pgsql-hackers
On Mon, Jan 10, 2022 at 09:10:59AM +0500, Andrey V. Lepikhov wrote:
> On 1/9/22 5:13 PM, Julien Rouhaud wrote:
> > For now the queryid mixes two different things: fingerprinting and query text
> > normalization.  Should each calculation method be allowed to do a different
> > normalization too, and if yes where should be stored the state data needed for
> > that?  If not, we would need some kind of primary hash for that purpose.
> > 
> Do You mean JumbleState?

Yes, or some other abstraction.  For instance with the proposed patch to handle
differently the IN clauses, you needs a to change JumbleState.

> I think, registering queryId generator we should store also a pointer (void
> **args) to an additional data entry, as usual.

Well, yes but only if you need to produce different versions of the normalized
query text, and I'm not convinced that's really necessary.

> > Looking at Andrey's use case for wanting multiple hashes, I don't think that
> > adaptive optimization needs a normalized query string.  The only use would be
> > to output some statistics, but this could be achieved by storing a list of
> > "primary queryid" for each adaptive entry.  That's probably also true for
> > anything that's not monitoring intended.  Also, all monitoring consumers should
> > probably agree on the same queryid, both fingerprint and normalized string, as
> > otherwise it's impossible to cross-reference metric data.
> > 
> I can add one more use case.
> Our extension for freezing query plan uses query tree comparison technique
> to prove, that the plan can be applied (and we don't need to execute
> planning procedure at all).
> The procedure of a tree equality checking is expensive and we use cheaper
> queryId comparison to identify possible candidates. So here, for the better
> performance and queries coverage, we need to use query tree normalization -
> queryId should be stable to some modifications in a query text which do not
> change semantics.
> As an example, query plan with external parameters can be used to execute
> constant query if these constants correspond by place and type to the
> parameters. So, queryId calculation technique returns also pointers to all
> constants and parameters found during the calculation.

I'm also working on a similar extension, and yes you can't accept any
fingerprinting approach for that.  I don't know what are the exact heuristics
of your cheaper queryid calculation are, but is it reasonable to use it with
something like pg_stat_statements?  If yes, you don't really need two queryid
approach for the sake of this single extension and therefore don't need to
store multiple jumble state or similar per statement.  Especially since
requiring another one would mean a performance drop as soon as you want to use
something as common as pg_stat_statements.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_upgrade verbosity when redirecting output to log file
Next
From: Amit Kapila
Date:
Subject: Re: Fix a possible typo in rewriteheap.c code comments