Thread: automating RangeTblEntry node support

automating RangeTblEntry node support

From

Peter Eisentraut

Date:

06 December 2023, 20:02:20

I have been looking into what it would take to get rid of the
custom_read_write and custom_query_jumble for the RangeTblEntry node
type. This is one of the larger and more complex exceptions left.

(Similar considerations would also apply to the Constraint node type.)

Allegedly, only certain fields of RangeTblEntry are valid based on
rtekind. But exactly which ones seems to be documented and handled
inconsistently. It seems that over time, new RTE kinds have "borrowed"
fields that notionally belong to other RTE kinds, which is technically
not a problem but creates a bit of a mess when trying to understand all
this.

I have some WIP patches to accompany this discussion.

Let's start with the jumble function. I suppose that this was just
carried over from the pg_stat_statements-specific code without any
detailed review. For example, the "inh" field is documented to be valid
in all RTEs, but it's only jumbled for RTE_RELATION. The "lateral"
field isn't looked at at all. I wouldn't be surprised if there are more
cases like this.

In the first attached patch, I remove _jumbleRangeTblEntry() and instead
add per-field query_jumble_ignore annotations to approximately match the
behavior of the previous custom code. The pg_stat_statements test suite
has some coverage of this. I get rid of switch on rtekind; this should
be technically correct, since we do the equal and copy functions like
this also. So for example the "inh" field is now considered in each
case. But I left "lateral" alone. I suspect several of these new
query_jumble_ignore should actually be dropped because the code was
wrong before.

In the second patch, I'm removing the switch on rtekind from
range_table_mutator_impl(). This should be fine because all the
subroutines can handle unset/NULL fields. And it removes one more place
that needs to track knowledge about which fields are valid when.

In the third patch, I'm removing the custom read/write functions for
RangeTblEntry. Those functions wanted to have a few fields at the front
to make the dump more legible; I'm doing that now by moving the fields
up in the actual struct.

Not done here, but something we should do is restructure the
documentation of RangeTblEntry itself. I'm still thinking about the
best way to structure this, but I'm thinking more like noting for each
field when it's used, instead by block like it is now, which makes it
awkward if a new RTE wants to borrow some fields.

Now one could probably rightfully complain that having all these unused
fields dumped would make the RangeTblEntry serialization bigger. I'm
not sure who big of a problem that actually is, considering how many
often-unset fields other node types have. But it deserves some
consideration. I think the best way to work around that would be to
have a mode that omits fields that have their default value (zero).
This would be more generally useful; for example Query also has a bunch
of fields that are not often set. I think this would be pretty easy to
implement, for example like

#define WRITE_INT_FIELD(fldname) \
if (full_mode || node->fldname) \
appendStringInfo(str, " :" CppAsString(fldname) " %d",
node->fldname)

There is also the discussion over at [0] about larger redesigns of the
node serialization format. I'm also interested in that, but here I'm
mainly trying to remove more special cases to make that kind of work
easier in the future.

Any thoughts about the direction?

[0]:
https://www.postgresql.org/message-id/flat/CACxu%3DvL_SD%3DWJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA%40mail.gmail.com

Attachment

Re: automating RangeTblEntry node support

From

Matthias van de Meent

Date:

06 December 2023, 21:20:03

On Wed, 6 Dec 2023 at 21:02, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have been looking into what it would take to get rid of the
> custom_read_write and custom_query_jumble for the RangeTblEntry node
> type.  This is one of the larger and more complex exceptions left.
> [...]
> Now one could probably rightfully complain that having all these unused
> fields dumped would make the RangeTblEntry serialization bigger.  I'm
> not sure who big of a problem that actually is, considering how many
> often-unset fields other node types have.  But it deserves some
> consideration.  I think the best way to work around that would be to
> have a mode that omits fields that have their default value (zero).
> This would be more generally useful; for example Query also has a bunch
> of fields that are not often set.  I think this would be pretty easy to
> implement, for example like

Actually, I've worked on this last weekend, and got some good results.
It did need some fine-tuning and field annotations, but got raw
nodeToString sizes down 50%+ for the pg_rewrite table's ev_action
column, and compressed-with-pglz size of pg_rewrite total down 30%+.

> #define WRITE_INT_FIELD(fldname) \
>      if (full_mode || node->fldname) \
>          appendStringInfo(str, " :" CppAsString(fldname) " %d",
> node->fldname)
>
> There is also the discussion over at [0] about larger redesigns of the
> node serialization format.  I'm also interested in that, but here I'm
> mainly trying to remove more special cases to make that kind of work
> easier in the future.
>
> Any thoughts about the direction?

I've created a new thread [0] with my patch. It actually didn't need
_that_ many manual changes - most of it was just updating the
gen_node_support.pl code generation, and making the macros do a good
job.

In general I'm all for reducing special cases, so +1 on the idea. I'll
have to check the specifics of the patches at a later point in time.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: automating RangeTblEntry node support

From

Peter Eisentraut

Date:

15 January 2024, 10:37:15

On 06.12.23 21:02, Peter Eisentraut wrote:
> I have been looking into what it would take to get rid of the 
> custom_read_write and custom_query_jumble for the RangeTblEntry node 
> type.  This is one of the larger and more complex exceptions left.
> 
> (Similar considerations would also apply to the Constraint node type.)

In this updated patch set, I have also added the treatment of the 
Constraint type.  (I also noted that the manual read/write functions for 
the Constraint type are out-of-sync again, so simplifying this would be 
really helpful.)  I have also added commit messages to each patch.

The way I have re-ordered the patch series now, I think patches 0001 
through 0003 are candidates for inclusion after review, patch 0004 still 
needs a bit more analysis and testing, as described therein.

Attachment

Re: automating RangeTblEntry node support

From

Paul Jungwirth

Date:

16 February 2024, 20:36:24

On 1/15/24 02:37, Peter Eisentraut wrote:
> In this updated patch set, I have also added the treatment of the Constraint type.  (I also noted 
> that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying 
> this would be really helpful.)  I have also added commit messages to each patch.
> 
> The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for 
> inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein.

I had to apply the first patch by hand (on 9f13376396), so this looks due for a rebase. Patches 2-4 
applied fine.

Compiles & passes tests after each patch.

The overall idea seems like a good improvement to me.

A few remarks about cleaning up the RangeTblEntry comments:

After the fourth patch we have a "Fields valid in all RTEs" comment twice in the struct, once at the 
top and once at the bottom. It's fine IMO but maybe the second could be "More fields valid in all RTEs"?

The new order of fields in RangleTblEntry matches the intro comment, which seems like another small 
benefit.

It seems like we are moving away from ever putting RTEKind-specific fields into a union as suggested 
by the FIXME comment here. It was written in 2002. Is it time to remove it?

This now needs to say "above" not "below":

     /*
      * join_using_alias is an alias clause attached directly to JOIN/USING. It
      * is different from the alias field (below) in that it does not hide the
      * range variables of the tables being joined.
      */
     Alias      *join_using_alias pg_node_attr(query_jumble_ignore);

Re bloating the serialization output, we could leave this last patch until after the work on that 
other thread is done to skip default-valued items.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: automating RangeTblEntry node support

From

Matthias van de Meent

Date:

17 February 2024, 23:06:19

On Fri, 16 Feb 2024 at 21:36, Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 1/15/24 02:37, Peter Eisentraut wrote:
> > In this updated patch set, I have also added the treatment of the Constraint type.  (I also noted
> > that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying
> > this would be really helpful.)  I have also added commit messages to each patch.
> >
> > The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for
> > inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein.
>
> Re bloating the serialization output, we could leave this last patch until after the work on that
> other thread is done to skip default-valued items.

I'm not sure that the cleanup which is done when changing a RTE's
rtekind is also complete enough for this purpose.
Things like inline_cte_walker change the node->rtekind, which could
leave residual junk data in fields that are currently dropped during
serialization (as the rtekind specifically ignores those fields), but
which would add overhead when the default omission is expected to
handle these fields; as they could then contain junk. It looks like
there is some care about zeroing now unused fields, but I haven't
checked that it covers all cases and fields to the extent required so
that removing this specialized serializer would have zero impact on
size once the default omission patch is committed.

An additional patch with a single function that for this purpose
clears junk fields from RTEs that changed kind would be appreciated:
it is often hand-coded at those locations the kind changes, but that's
more sensitive to programmer error.

Kind regards,

Matthias van de Meent

Re: automating RangeTblEntry node support

From

Peter Eisentraut

Date:

20 February 2024, 07:57:25

On 18.02.24 00:06, Matthias van de Meent wrote:
> I'm not sure that the cleanup which is done when changing a RTE's
> rtekind is also complete enough for this purpose.
> Things like inline_cte_walker change the node->rtekind, which could
> leave residual junk data in fields that are currently dropped during
> serialization (as the rtekind specifically ignores those fields), but
> which would add overhead when the default omission is expected to
> handle these fields; as they could then contain junk. It looks like
> there is some care about zeroing now unused fields, but I haven't
> checked that it covers all cases and fields to the extent required so
> that removing this specialized serializer would have zero impact on
> size once the default omission patch is committed.
> 
> An additional patch with a single function that for this purpose
> clears junk fields from RTEs that changed kind would be appreciated:
> it is often hand-coded at those locations the kind changes, but that's
> more sensitive to programmer error.

Yes, interesting idea.  Or maybe an assert-like function that checks an 
existing structure for consistency.  Or maybe both.  I'll try this out.

In the meantime, if there are no remaining concerns, I propose to commit 
the first two patches

Remove custom Constraint node read/write implementations
Remove custom _jumbleRangeTblEntry()

Re: automating RangeTblEntry node support

From

Peter Eisentraut

Date:

11 March 2024, 09:29:38

On 20.02.24 08:57, Peter Eisentraut wrote:
> On 18.02.24 00:06, Matthias van de Meent wrote:
>> I'm not sure that the cleanup which is done when changing a RTE's
>> rtekind is also complete enough for this purpose.
>> Things like inline_cte_walker change the node->rtekind, which could
>> leave residual junk data in fields that are currently dropped during
>> serialization (as the rtekind specifically ignores those fields), but
>> which would add overhead when the default omission is expected to
>> handle these fields; as they could then contain junk. It looks like
>> there is some care about zeroing now unused fields, but I haven't
>> checked that it covers all cases and fields to the extent required so
>> that removing this specialized serializer would have zero impact on
>> size once the default omission patch is committed.
>>
>> An additional patch with a single function that for this purpose
>> clears junk fields from RTEs that changed kind would be appreciated:
>> it is often hand-coded at those locations the kind changes, but that's
>> more sensitive to programmer error.
> 
> Yes, interesting idea.  Or maybe an assert-like function that checks an 
> existing structure for consistency.  Or maybe both.  I'll try this out.
> 
> In the meantime, if there are no remaining concerns, I propose to commit 
> the first two patches
> 
> Remove custom Constraint node read/write implementations
> Remove custom _jumbleRangeTblEntry()

After a few side quests, here is an updated patch set.  (I had committed 
the first of the two patches mentioned above, but not yet the second one.)

v3-0001-Remove-obsolete-comment.patch
v3-0002-Improve-comment.patch

These just update a few comments around the RangeTblEntry definition.

v3-0003-Reformat-some-node-comments.patch
v3-0004-Remove-custom-_jumbleRangeTblEntry.patch

This is pretty much the same patch as before.  I have now split it up to 
first reformat the comments to make room for the node annotations.  This 
patch is now also pgindent-proof.  After some side quest discussions, 
the set of fields to jumble seems correct now, so commit message 
comments to the contrary have been dropped.

v3-0005-Make-RangeTblEntry-dump-order-consistent.patch

I separated that from the 0008 patch below.  I think it useful even if 
we don't go ahead with 0008 now, for example in dumps from the debugger, 
and just in general to keep everything more consistent.

v3-0006-WIP-AssertRangeTblEntryIsValid.patch

This is in response to some of the discussions where there was some 
doubt whether all fields are always filled and cleared correctly when 
the RTE kind is changed.  Seems correct as far as this goes.  I didn't 
know of a good way to hook this in, so I put it into the write/read 
functions, which is obviously a bit weird if I'm proposing to remove 
them later.  Consider it a proof of concept.

v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch

At this point, I'm not too stressed about pressing forward with these 
last two patches.  We can look at them again perhaps if we make progress 
on a more compact node output format.  When I started this thread, I had 
a lot of questions about various details about the RangeTblEntry struct, 
and we have achieved many answers during the discussions, so I'm happy 
with the progress.  So for PG17, I'd like to just do patches 0001..0005.

Attachment

Re: automating RangeTblEntry node support

From

Andrew Dunstan

Date:

21 March 2024, 09:51:18

On Mon, Mar 11, 2024 at 5:29 AM Peter Eisentraut <peter@eisentraut.org> wrote:

On 20.02.24 08:57, Peter Eisentraut wrote:
> On 18.02.24 00:06, Matthias van de Meent wrote:
>> I'm not sure that the cleanup which is done when changing a RTE's
>> rtekind is also complete enough for this purpose.
>> Things like inline_cte_walker change the node->rtekind, which could
>> leave residual junk data in fields that are currently dropped during
>> serialization (as the rtekind specifically ignores those fields), but
>> which would add overhead when the default omission is expected to
>> handle these fields; as they could then contain junk. It looks like
>> there is some care about zeroing now unused fields, but I haven't
>> checked that it covers all cases and fields to the extent required so
>> that removing this specialized serializer would have zero impact on
>> size once the default omission patch is committed.
>>
>> An additional patch with a single function that for this purpose
>> clears junk fields from RTEs that changed kind would be appreciated:
>> it is often hand-coded at those locations the kind changes, but that's
>> more sensitive to programmer error.
>
> Yes, interesting idea. Or maybe an assert-like function that checks an
> existing structure for consistency. Or maybe both. I'll try this out.
>
> In the meantime, if there are no remaining concerns, I propose to commit
> the first two patches
>
> Remove custom Constraint node read/write implementations
> Remove custom _jumbleRangeTblEntry()

After a few side quests, here is an updated patch set. (I had committed
the first of the two patches mentioned above, but not yet the second one.)

v3-0001-Remove-obsolete-comment.patch
v3-0002-Improve-comment.patch

These just update a few comments around the RangeTblEntry definition.

v3-0003-Reformat-some-node-comments.patch
v3-0004-Remove-custom-_jumbleRangeTblEntry.patch

This is pretty much the same patch as before. I have now split it up to
first reformat the comments to make room for the node annotations. This
patch is now also pgindent-proof. After some side quest discussions,
the set of fields to jumble seems correct now, so commit message
comments to the contrary have been dropped.

v3-0005-Make-RangeTblEntry-dump-order-consistent.patch

I separated that from the 0008 patch below. I think it useful even if
we don't go ahead with 0008 now, for example in dumps from the debugger,
and just in general to keep everything more consistent.

v3-0006-WIP-AssertRangeTblEntryIsValid.patch

This is in response to some of the discussions where there was some
doubt whether all fields are always filled and cleared correctly when
the RTE kind is changed. Seems correct as far as this goes. I didn't
know of a good way to hook this in, so I put it into the write/read
functions, which is obviously a bit weird if I'm proposing to remove
them later. Consider it a proof of concept.

v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch

At this point, I'm not too stressed about pressing forward with these
last two patches. We can look at them again perhaps if we make progress
on a more compact node output format. When I started this thread, I had
a lot of questions about various details about the RangeTblEntry struct,
and we have achieved many answers during the discussions, so I'm happy
with the progress. So for PG17, I'd like to just do patches 0001..0005.

Patches 1 thru 5 look good to me

cheers

andrew

Re: automating RangeTblEntry node support

From

Peter Eisentraut

Date:

22 March 2024, 06:49:51

On 21.03.24 10:51, Andrew Dunstan wrote:
>     At this point, I'm not too stressed about pressing forward with these
>     last two patches.  We can look at them again perhaps if we make
>     progress
>     on a more compact node output format.  When I started this thread, I
>     had
>     a lot of questions about various details about the RangeTblEntry
>     struct,
>     and we have achieved many answers during the discussions, so I'm happy
>     with the progress.  So for PG17, I'd like to just do patches 0001..0005.
> 
> Patches 1 thru 5 look good to me

Thanks for checking.  I have committed these (1 through 5) and will 
close the commit fest entry.