Thread: automating RangeTblEntry node support
I have been looking into what it would take to get rid of the custom_read_write and custom_query_jumble for the RangeTblEntry node type. This is one of the larger and more complex exceptions left. (Similar considerations would also apply to the Constraint node type.) Allegedly, only certain fields of RangeTblEntry are valid based on rtekind. But exactly which ones seems to be documented and handled inconsistently. It seems that over time, new RTE kinds have "borrowed" fields that notionally belong to other RTE kinds, which is technically not a problem but creates a bit of a mess when trying to understand all this. I have some WIP patches to accompany this discussion. Let's start with the jumble function. I suppose that this was just carried over from the pg_stat_statements-specific code without any detailed review. For example, the "inh" field is documented to be valid in all RTEs, but it's only jumbled for RTE_RELATION. The "lateral" field isn't looked at at all. I wouldn't be surprised if there are more cases like this. In the first attached patch, I remove _jumbleRangeTblEntry() and instead add per-field query_jumble_ignore annotations to approximately match the behavior of the previous custom code. The pg_stat_statements test suite has some coverage of this. I get rid of switch on rtekind; this should be technically correct, since we do the equal and copy functions like this also. So for example the "inh" field is now considered in each case. But I left "lateral" alone. I suspect several of these new query_jumble_ignore should actually be dropped because the code was wrong before. In the second patch, I'm removing the switch on rtekind from range_table_mutator_impl(). This should be fine because all the subroutines can handle unset/NULL fields. And it removes one more place that needs to track knowledge about which fields are valid when. In the third patch, I'm removing the custom read/write functions for RangeTblEntry. Those functions wanted to have a few fields at the front to make the dump more legible; I'm doing that now by moving the fields up in the actual struct. Not done here, but something we should do is restructure the documentation of RangeTblEntry itself. I'm still thinking about the best way to structure this, but I'm thinking more like noting for each field when it's used, instead by block like it is now, which makes it awkward if a new RTE wants to borrow some fields. Now one could probably rightfully complain that having all these unused fields dumped would make the RangeTblEntry serialization bigger. I'm not sure who big of a problem that actually is, considering how many often-unset fields other node types have. But it deserves some consideration. I think the best way to work around that would be to have a mode that omits fields that have their default value (zero). This would be more generally useful; for example Query also has a bunch of fields that are not often set. I think this would be pretty easy to implement, for example like #define WRITE_INT_FIELD(fldname) \ if (full_mode || node->fldname) \ appendStringInfo(str, " :" CppAsString(fldname) " %d", node->fldname) There is also the discussion over at [0] about larger redesigns of the node serialization format. I'm also interested in that, but here I'm mainly trying to remove more special cases to make that kind of work easier in the future. Any thoughts about the direction? [0]: https://www.postgresql.org/message-id/flat/CACxu%3DvL_SD%3DWJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA%40mail.gmail.com
Attachment
On Wed, 6 Dec 2023 at 21:02, Peter Eisentraut <peter@eisentraut.org> wrote: > > I have been looking into what it would take to get rid of the > custom_read_write and custom_query_jumble for the RangeTblEntry node > type. This is one of the larger and more complex exceptions left. > [...] > Now one could probably rightfully complain that having all these unused > fields dumped would make the RangeTblEntry serialization bigger. I'm > not sure who big of a problem that actually is, considering how many > often-unset fields other node types have. But it deserves some > consideration. I think the best way to work around that would be to > have a mode that omits fields that have their default value (zero). > This would be more generally useful; for example Query also has a bunch > of fields that are not often set. I think this would be pretty easy to > implement, for example like Actually, I've worked on this last weekend, and got some good results. It did need some fine-tuning and field annotations, but got raw nodeToString sizes down 50%+ for the pg_rewrite table's ev_action column, and compressed-with-pglz size of pg_rewrite total down 30%+. > #define WRITE_INT_FIELD(fldname) \ > if (full_mode || node->fldname) \ > appendStringInfo(str, " :" CppAsString(fldname) " %d", > node->fldname) > > There is also the discussion over at [0] about larger redesigns of the > node serialization format. I'm also interested in that, but here I'm > mainly trying to remove more special cases to make that kind of work > easier in the future. > > Any thoughts about the direction? I've created a new thread [0] with my patch. It actually didn't need _that_ many manual changes - most of it was just updating the gen_node_support.pl code generation, and making the macros do a good job. In general I'm all for reducing special cases, so +1 on the idea. I'll have to check the specifics of the patches at a later point in time. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On 06.12.23 21:02, Peter Eisentraut wrote: > I have been looking into what it would take to get rid of the > custom_read_write and custom_query_jumble for the RangeTblEntry node > type. This is one of the larger and more complex exceptions left. > > (Similar considerations would also apply to the Constraint node type.) In this updated patch set, I have also added the treatment of the Constraint type. (I also noted that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying this would be really helpful.) I have also added commit messages to each patch. The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein.
Attachment
On 1/15/24 02:37, Peter Eisentraut wrote: > In this updated patch set, I have also added the treatment of the Constraint type. (I also noted > that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying > this would be really helpful.) I have also added commit messages to each patch. > > The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for > inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein. I had to apply the first patch by hand (on 9f13376396), so this looks due for a rebase. Patches 2-4 applied fine. Compiles & passes tests after each patch. The overall idea seems like a good improvement to me. A few remarks about cleaning up the RangeTblEntry comments: After the fourth patch we have a "Fields valid in all RTEs" comment twice in the struct, once at the top and once at the bottom. It's fine IMO but maybe the second could be "More fields valid in all RTEs"? The new order of fields in RangleTblEntry matches the intro comment, which seems like another small benefit. It seems like we are moving away from ever putting RTEKind-specific fields into a union as suggested by the FIXME comment here. It was written in 2002. Is it time to remove it? This now needs to say "above" not "below": /* * join_using_alias is an alias clause attached directly to JOIN/USING. It * is different from the alias field (below) in that it does not hide the * range variables of the tables being joined. */ Alias *join_using_alias pg_node_attr(query_jumble_ignore); Re bloating the serialization output, we could leave this last patch until after the work on that other thread is done to skip default-valued items. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Fri, 16 Feb 2024 at 21:36, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 1/15/24 02:37, Peter Eisentraut wrote: > > In this updated patch set, I have also added the treatment of the Constraint type. (I also noted > > that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying > > this would be really helpful.) I have also added commit messages to each patch. > > > > The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for > > inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein. > > Re bloating the serialization output, we could leave this last patch until after the work on that > other thread is done to skip default-valued items. I'm not sure that the cleanup which is done when changing a RTE's rtekind is also complete enough for this purpose. Things like inline_cte_walker change the node->rtekind, which could leave residual junk data in fields that are currently dropped during serialization (as the rtekind specifically ignores those fields), but which would add overhead when the default omission is expected to handle these fields; as they could then contain junk. It looks like there is some care about zeroing now unused fields, but I haven't checked that it covers all cases and fields to the extent required so that removing this specialized serializer would have zero impact on size once the default omission patch is committed. An additional patch with a single function that for this purpose clears junk fields from RTEs that changed kind would be appreciated: it is often hand-coded at those locations the kind changes, but that's more sensitive to programmer error. Kind regards, Matthias van de Meent
On 18.02.24 00:06, Matthias van de Meent wrote: > I'm not sure that the cleanup which is done when changing a RTE's > rtekind is also complete enough for this purpose. > Things like inline_cte_walker change the node->rtekind, which could > leave residual junk data in fields that are currently dropped during > serialization (as the rtekind specifically ignores those fields), but > which would add overhead when the default omission is expected to > handle these fields; as they could then contain junk. It looks like > there is some care about zeroing now unused fields, but I haven't > checked that it covers all cases and fields to the extent required so > that removing this specialized serializer would have zero impact on > size once the default omission patch is committed. > > An additional patch with a single function that for this purpose > clears junk fields from RTEs that changed kind would be appreciated: > it is often hand-coded at those locations the kind changes, but that's > more sensitive to programmer error. Yes, interesting idea. Or maybe an assert-like function that checks an existing structure for consistency. Or maybe both. I'll try this out. In the meantime, if there are no remaining concerns, I propose to commit the first two patches Remove custom Constraint node read/write implementations Remove custom _jumbleRangeTblEntry()
On 20.02.24 08:57, Peter Eisentraut wrote: > On 18.02.24 00:06, Matthias van de Meent wrote: >> I'm not sure that the cleanup which is done when changing a RTE's >> rtekind is also complete enough for this purpose. >> Things like inline_cte_walker change the node->rtekind, which could >> leave residual junk data in fields that are currently dropped during >> serialization (as the rtekind specifically ignores those fields), but >> which would add overhead when the default omission is expected to >> handle these fields; as they could then contain junk. It looks like >> there is some care about zeroing now unused fields, but I haven't >> checked that it covers all cases and fields to the extent required so >> that removing this specialized serializer would have zero impact on >> size once the default omission patch is committed. >> >> An additional patch with a single function that for this purpose >> clears junk fields from RTEs that changed kind would be appreciated: >> it is often hand-coded at those locations the kind changes, but that's >> more sensitive to programmer error. > > Yes, interesting idea. Or maybe an assert-like function that checks an > existing structure for consistency. Or maybe both. I'll try this out. > > In the meantime, if there are no remaining concerns, I propose to commit > the first two patches > > Remove custom Constraint node read/write implementations > Remove custom _jumbleRangeTblEntry() After a few side quests, here is an updated patch set. (I had committed the first of the two patches mentioned above, but not yet the second one.) v3-0001-Remove-obsolete-comment.patch v3-0002-Improve-comment.patch These just update a few comments around the RangeTblEntry definition. v3-0003-Reformat-some-node-comments.patch v3-0004-Remove-custom-_jumbleRangeTblEntry.patch This is pretty much the same patch as before. I have now split it up to first reformat the comments to make room for the node annotations. This patch is now also pgindent-proof. After some side quest discussions, the set of fields to jumble seems correct now, so commit message comments to the contrary have been dropped. v3-0005-Make-RangeTblEntry-dump-order-consistent.patch I separated that from the 0008 patch below. I think it useful even if we don't go ahead with 0008 now, for example in dumps from the debugger, and just in general to keep everything more consistent. v3-0006-WIP-AssertRangeTblEntryIsValid.patch This is in response to some of the discussions where there was some doubt whether all fields are always filled and cleared correctly when the RTE kind is changed. Seems correct as far as this goes. I didn't know of a good way to hook this in, so I put it into the write/read functions, which is obviously a bit weird if I'm proposing to remove them later. Consider it a proof of concept. v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch At this point, I'm not too stressed about pressing forward with these last two patches. We can look at them again perhaps if we make progress on a more compact node output format. When I started this thread, I had a lot of questions about various details about the RangeTblEntry struct, and we have achieved many answers during the discussions, so I'm happy with the progress. So for PG17, I'd like to just do patches 0001..0005.
Attachment
- v3-0001-Remove-obsolete-comment.patch
- v3-0002-Improve-comment.patch
- v3-0003-Reformat-some-node-comments.patch
- v3-0004-Remove-custom-_jumbleRangeTblEntry.patch
- v3-0005-Make-RangeTblEntry-dump-order-consistent.patch
- v3-0006-WIP-AssertRangeTblEntryIsValid.patch
- v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
- v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch
On Mon, Mar 11, 2024 at 5:29 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 20.02.24 08:57, Peter Eisentraut wrote:
> On 18.02.24 00:06, Matthias van de Meent wrote:
>> I'm not sure that the cleanup which is done when changing a RTE's
>> rtekind is also complete enough for this purpose.
>> Things like inline_cte_walker change the node->rtekind, which could
>> leave residual junk data in fields that are currently dropped during
>> serialization (as the rtekind specifically ignores those fields), but
>> which would add overhead when the default omission is expected to
>> handle these fields; as they could then contain junk. It looks like
>> there is some care about zeroing now unused fields, but I haven't
>> checked that it covers all cases and fields to the extent required so
>> that removing this specialized serializer would have zero impact on
>> size once the default omission patch is committed.
>>
>> An additional patch with a single function that for this purpose
>> clears junk fields from RTEs that changed kind would be appreciated:
>> it is often hand-coded at those locations the kind changes, but that's
>> more sensitive to programmer error.
>
> Yes, interesting idea. Or maybe an assert-like function that checks an
> existing structure for consistency. Or maybe both. I'll try this out.
>
> In the meantime, if there are no remaining concerns, I propose to commit
> the first two patches
>
> Remove custom Constraint node read/write implementations
> Remove custom _jumbleRangeTblEntry()
After a few side quests, here is an updated patch set. (I had committed
the first of the two patches mentioned above, but not yet the second one.)
v3-0001-Remove-obsolete-comment.patch
v3-0002-Improve-comment.patch
These just update a few comments around the RangeTblEntry definition.
v3-0003-Reformat-some-node-comments.patch
v3-0004-Remove-custom-_jumbleRangeTblEntry.patch
This is pretty much the same patch as before. I have now split it up to
first reformat the comments to make room for the node annotations. This
patch is now also pgindent-proof. After some side quest discussions,
the set of fields to jumble seems correct now, so commit message
comments to the contrary have been dropped.
v3-0005-Make-RangeTblEntry-dump-order-consistent.patch
I separated that from the 0008 patch below. I think it useful even if
we don't go ahead with 0008 now, for example in dumps from the debugger,
and just in general to keep everything more consistent.
v3-0006-WIP-AssertRangeTblEntryIsValid.patch
This is in response to some of the discussions where there was some
doubt whether all fields are always filled and cleared correctly when
the RTE kind is changed. Seems correct as far as this goes. I didn't
know of a good way to hook this in, so I put it into the write/read
functions, which is obviously a bit weird if I'm proposing to remove
them later. Consider it a proof of concept.
v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch
At this point, I'm not too stressed about pressing forward with these
last two patches. We can look at them again perhaps if we make progress
on a more compact node output format. When I started this thread, I had
a lot of questions about various details about the RangeTblEntry struct,
and we have achieved many answers during the discussions, so I'm happy
with the progress. So for PG17, I'd like to just do patches 0001..0005.
Patches 1 thru 5 look good to me
cheers
andrew
On 21.03.24 10:51, Andrew Dunstan wrote: > At this point, I'm not too stressed about pressing forward with these > last two patches. We can look at them again perhaps if we make > progress > on a more compact node output format. When I started this thread, I > had > a lot of questions about various details about the RangeTblEntry > struct, > and we have achieved many answers during the discussions, so I'm happy > with the progress. So for PG17, I'd like to just do patches 0001..0005. > > Patches 1 thru 5 look good to me Thanks for checking. I have committed these (1 through 5) and will close the commit fest entry.