Re: Generating code for query jumbling through gen_node_support.pl - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Generating code for query jumbling through gen_node_support.pl |
Date | |
Msg-id | Y8EOUuViEL8Ewvf/@paquier.xyz Whole thread Raw |
In response to | Re: Generating code for query jumbling through gen_node_support.pl (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: Generating code for query jumbling through gen_node_support.pl
|
List | pgsql-hackers |
On Sat, Jan 07, 2023 at 07:37:49AM +0100, Peter Eisentraut wrote: > On 07.12.22 08:56, Michael Paquier wrote: >> The location of the Nodes is quite invasive because we only care about >> that for T_Const now in the query jumbling, and this could be >> compensated with a third pg_node_attr() that tracks for the "int >> location" of a Node whether it should participate in the jumbling or >> not. > > The generation script already has a way to identify location fields, by ($t > eq 'int' && $f =~ 'location$'), so you could use that as well. I did not recall exactly everything here, but there are two parts to the logic: - gen_node_support.pl uses exactly this condition when scanning the nodes to put the correct macro to mark a location to track, calling down RecordConstLocation(). - Marking a bunch of nodes as jumble_ignore is actually necessary, or we may finish by silencing parts of queries that should be semantically unrelevant to the queries jumbled (ColumnRef is one). Using a "jumble_ignore" flag is equally invasive to using an "jumble_include" flag for each field, I am afraid, as the number of fields in the nodes included in jumbles is pretty equivalent to the number of fields ignored. I tend to prefer the approach of ignoring things though, which is more consistent with the practive for node read, write and copy. Anyway, when it comes to the location, another thing that can be considered here would be to require a node-level flag for the nodes on which we want to track the location. This overlaps a bit with the fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes most of the code changes like this one as at the end we only care about the location for Const nodes: - int location; /* token location, or -1 if unknown */ + int location pg_node_attr(jumble_ignore); /* token location, or -1 + * if unknown */ I have taken this approach in v2 of the patch, shaving ~100 lines of more code as there is no need to mark all these location fields with "jumble_ignore" anymore, except for Const, of course. >> There is also an argument where we would want to not include by >> default new fields added to a Node, but that would require added more >> pg_node_attr() than what's done here. > > I'm concerned about the large number of additional field annotations this > adds. We have been careful so far to document the use of each attribute, > e.g., *why* does a field not need to be copied etc. This patch adds dozens > and dozens of annotations without any explanation at all. Now, the code > this replaces also has no documentation, but maybe this is the time to add > some. That's fair, though it is not doing to buy us much to update all the nodes with similar small comments, as well. As far as I know, there are basiscally three things here: typmods, collation information, and internal data of the nodes stored during parse-analyze. I have added more documentation to track what looks like the most relevant areas. I have begun running some performance tests with this stuff and HEAD to see if this leads to any difference in the query ID compilation (compute_query_id = on, on scissors roughly) with a simple set of short commands (like BEGIN/COMMIT) or longer ones, and I am seeing a speedup trend actually (?). I still need to think more about a set of tests here, but I think that micro-benchmarking of JumbleQuery() is the most adapted approach to minimize the noise, with a few nodes of various sizes (Const, Query, ColumnRef, anything..). Thoughts? -- Michael
Attachment
pgsql-hackers by date: