Re: Generating code for query jumbling through gen_node_support.pl - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Generating code for query jumbling through gen_node_support.pl
Date
Msg-id Y8YakMwx2a6Y63oR@paquier.xyz
Whole thread Raw
In response to Re: Generating code for query jumbling through gen_node_support.pl  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Generating code for query jumbling through gen_node_support.pl
List pgsql-hackers
On Mon, Jan 16, 2023 at 03:13:35PM +0100, Peter Eisentraut wrote:
> On 13.01.23 08:54, Michael Paquier wrote:
>> Using a "jumble_ignore" flag is equally invasive to using an
>> "jumble_include" flag for each field, I am afraid, as the number of
>> fields in the nodes included in jumbles is pretty equivalent to the
>> number of fields ignored.  I tend to prefer the approach of ignoring
>> things though, which is more consistent with the practive for node
>> read, write and copy.
>
> I concur that jumble_ignore is better.  I suppose you placed the
> jumble_ignore markers to maintain parity with the existing code, but I think
> that some the markers are actually wrong and are just errors of omission in
> the existing code (such as Query.override, to pick a random example).  The
> way you have structured this would allow us to find and analyze such errors
> better.

Thanks.  Yes, I have made an effort of going down to maintain an exact
compatibility with the existing code for now.  My take is that
removing or adding more things into the jumble deserves its own
discussion.  I think that's a bit better once this code is automated
with the nodes, now it would not be difficult either to adjust HEAD
depending on that.  Only the analysis effort is different.

>> Anyway, when it comes to the location, another thing that can be
>> considered here would be to require a node-level flag for the nodes on
>> which we want to track the location.  This overlaps a bit with the
>> fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
>> most of the code changes like this one as at the end we only care
>> about the location for Const nodes:
>> -   int         location;       /* token location, or -1 if unknown */
>> +   int         location pg_node_attr(jumble_ignore);   /* token location, or -1
>> +                                                        * if unknown */
>>
>> I have taken this approach in v2 of the patch, shaving ~100 lines of
>> more code as there is no need to mark all these location fields with
>> "jumble_ignore" anymore, except for Const, of course.
>
> I don't understand why you chose that when we already have an established
> way.  This would just make the jumble annotations inconsistent with the
> other annotations.

Because we don't want to track the location of all the nodes!  If we
do that, pg_stat_statements would begin to parameterize a lot more
things, from column aliases to full contents of IN or WITH clauses.
The root point is that we only want to track the jumble location for
Const nodes now.  In order to do that, there are two approaches:
- Mark all the locations with jumble_ignore: more invasive, but
it requires only one per-field attribute with "jumble_ignore".  This
is what v1 did.
- Mark only the fields where we want to track the location with a
second node type, like "jumble_location".  We could restrict that
depending on the field type or its name on top of checking the field
attribute, whatever.  This is what v2 did.

Which approach do you prefer?  Marking all the locations we don't want
with jumble_ignore, or introduce a second attribute with
jumble_location.  I'd tend to prefer the approach of minimizing the
number of node and field attributes, FWIW.  Now you have worked on
this area previously, so your view may be more adapted than my
thinking process

The long-term perspective is that I'd like to extend the tracking of
the locations to more DDL nodes, like parameters of SET statements or
parts of CALL statements.  Not to mention that this makes the work of
forks easier.  This is a separate discussion.

> I also have two suggestions to make this patch more palatable:
>
> 1. Make a separate patch to reformat long comments, like
> 835d476fd21bcfb60b055941dee8c3d9559af14c.
>
> 2. Make a separate patch to move the jumble support to
> src/backend/nodes/jumblefuncs.c.  (It was probably a mistake that it wasn't
> there to begin with, and some of the errors of omission alluded to above are
> probably caused by it having been hidden away in the wrong place.)

Both suggestions make sense.  I'll shape the next series of the patch
to do something among these lines.

My question about the location tracking greatly influences the first
patch (comment reformating) and third patch (automated generation of
the code) of the series, so do you have a preference about that?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Next
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply