Re: pg_stat_statements and "IN" conditions - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: pg_stat_statements and "IN" conditions |
Date | |
Msg-id | 20231027150244.jmdvn23l4rpx6hha@erthalion.local Whole thread Raw |
In response to | Re: pg_stat_statements and "IN" conditions (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: pg_stat_statements and "IN" conditions
|
List | pgsql-hackers |
> On Thu, Oct 26, 2023 at 09:08:42AM +0900, Michael Paquier wrote: > typedef struct ArrayExpr > { > + pg_node_attr(custom_query_jumble) > + > > Hmm. I am not sure that this is the best approach > implementation-wise. Wouldn't it be better to invent a new > pg_node_attr (these can include parameters as well!), say > query_jumble_merge or query_jumble_agg_location that aggregates all > the parameters of a list to be considered as a single element. To put > it short, we could also apply the same property to other parts of a > parsed tree, and not only an ArrayExpr's list. Sounds like an interesting idea, something like: typedef struct ArrayExpr { ... List *elements pg_node_attr(query_jumble_merge); to replace simple JUMBLE_NODE(elements) with more elaborated logic. > /* GUC parameters */ > extern PGDLLIMPORT int compute_query_id; > - > +extern PGDLLIMPORT bool query_id_const_merge; > > Not much a fan of this addition as well for an in-core GUC. I would > suggest pushing the GUC layer to pg_stat_statements, maintaining the > computation method to use as a field of JumbleState as I suspect that > this is something we should not enforce system-wide, but at > extension-level instead. I also do not particularly like an extra GUC here, but as far as I can tell to make it pg_stat_statements GUC only it has to be something similar to EnableQueryId (e.g. EnableQueryConstMerging), that will be called from pgss. Does this sound better? > + /* > + * Indicates the constant represents the beginning or the end of a merged > + * constants interval. > + */ > + bool merged; > > Not sure that this is the best thing to do either. Instead of this > extra boolean flag, could it be simpler if we switch LocationLen so as > we track the start position and the end position of a constant in a > query string, so as we'd use one LocationLen for a whole set of Const > nodes in an ArrayExpr? Perhaps this could just be a refactoring piece > of its own? Sounds interesting as well, but it seems to me there is a catch. I'll try to elaborate, bear with me: * if the start and the end positions of a constant means the first and the last character representing it, we need the textual length of the constant in the query to be able to construct such a LocationLen. The lengths are calculated in pg_stat_statements later, not in JumbleQuery, and it uses parser for that. Doing all of this in JumbleQuery doesn't sound reasonable to me. * if instead we talk about the start and the end positions in a set of constants, that would mean locations of the first and the last constants in the set, and everything seems fine. But for such LocationLen to represent a single constant (not a set of constants), it means that only the start position would be meaningful, the end position will not be used. The second approach is somewhat close to be simpler than the merge flag, but assumes the ugliness for a single constant. What do you think about this? > + /* > + * If the first expression is a constant, verify if the following elements > + * are constants as well. If yes, the list is eligible for merging, and the > + * order of magnitude need to be calculated. > + */ > + if (IsA(firstExpr, Const)) > + { > + foreach(temp, elements) > + if (!IsA(lfirst(temp), Const)) > + return false; > > This path should be benchmarked, IMO. I can do some benchmarking here, but of course it's going to be slower than the baseline. The main idea behind the patch is to trade this overhead for the benefits in the future while processing pgss records, hoping that it's going to be worth it (and in those extreme cases I'm trying to address it's definitely worth it).
pgsql-hackers by date: