Re: Generating code for query jumbling through gen_node_support.pl - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Generating code for query jumbling through gen_node_support.pl
Date
Msg-id Y8EOUuViEL8Ewvf/@paquier.xyz
Whole thread Raw
In response to Re: Generating code for query jumbling through gen_node_support.pl  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Generating code for query jumbling through gen_node_support.pl
List pgsql-hackers
On Sat, Jan 07, 2023 at 07:37:49AM +0100, Peter Eisentraut wrote:
> On 07.12.22 08:56, Michael Paquier wrote:
>> The location of the Nodes is quite invasive because we only care about
>> that for T_Const now in the query jumbling, and this could be
>> compensated with a third pg_node_attr() that tracks for the "int
>> location" of a Node whether it should participate in the jumbling or
>> not.
>
> The generation script already has a way to identify location fields, by ($t
> eq 'int' && $f =~ 'location$'), so you could use that as well.

I did not recall exactly everything here, but there are two parts to
the logic:
- gen_node_support.pl uses exactly this condition when scanning the
nodes to put the correct macro to mark a location to track, calling
down RecordConstLocation().
- Marking a bunch of nodes as jumble_ignore is actually necessary, or
we may finish by silencing parts of queries that should be
semantically unrelevant to the queries jumbled (ColumnRef is one).
Using a "jumble_ignore" flag is equally invasive to using an
"jumble_include" flag for each field, I am afraid, as the number of
fields in the nodes included in jumbles is pretty equivalent to the
number of fields ignored.  I tend to prefer the approach of ignoring
things though, which is more consistent with the practive for node
read, write and copy.

Anyway, when it comes to the location, another thing that can be
considered here would be to require a node-level flag for the nodes on
which we want to track the location.  This overlaps a bit with the
fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
most of the code changes like this one as at the end we only care
about the location for Const nodes:
-   int         location;       /* token location, or -1 if unknown */
+   int         location pg_node_attr(jumble_ignore);   /* token location, or -1
+                                                        * if unknown */

I have taken this approach in v2 of the patch, shaving ~100 lines of
more code as there is no need to mark all these location fields with
"jumble_ignore" anymore, except for Const, of course.

>> There is also an argument where we would want to not include by
>> default new fields added to a Node, but that would require added more
>> pg_node_attr() than what's done here.
>
> I'm concerned about the large number of additional field annotations this
> adds.  We have been careful so far to document the use of each attribute,
> e.g., *why* does a field not need to be copied etc.  This patch adds dozens
> and dozens of annotations without any explanation at all.  Now, the code
> this replaces also has no documentation, but maybe this is the time to add
> some.

That's fair, though it is not doing to buy us much to update all the
nodes with similar small comments, as well.  As far as I know, there
are basiscally three things here: typmods, collation information, and
internal data of the nodes stored during parse-analyze.  I have added
more documentation to track what looks like the most relevant areas.

I have begun running some performance tests with this stuff and HEAD
to see if this leads to any difference in the query ID compilation
(compute_query_id = on, on scissors roughly) with a simple set of
short commands (like BEGIN/COMMIT) or longer ones, and I am seeing a
speedup trend actually (?).  I still need to think more about a set of
tests here, but I think that micro-benchmarking of JumbleQuery() is
the most adapted approach to minimize the noise, with a few nodes of
various sizes (Const, Query, ColumnRef, anything..).

Thoughts?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Blocking execution of SECURITY INVOKER
Next
From: David Geier
Date:
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE