Generating code for query jumbling through gen_node_support.pl - Mailing list pgsql-hackers

From Michael Paquier
Subject Generating code for query jumbling through gen_node_support.pl
Date
Msg-id Y5BHOUhX3zTH/ig6@paquier.xyz
Whole thread Raw
Responses Re: Generating code for query jumbling through gen_node_support.pl
Re: Generating code for query jumbling through gen_node_support.pl
List pgsql-hackers
Hi all,

This thread is a follow-up of the recent discussion about query
jumbling with DDL statements, where the conclusion was that we'd want
to generate all this code automatically for all the nodes:
https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com

What this patch allows to do it to compute the same query ID for
utility statements using their parsed Node state instead of their
string, meaning that things like "BEGIN", "bEGIN" or "begin" would be
treated the same, for example.  But the main idea is not only that.

I have implemented that as of the attached, where the following things
are done:
- queryjumble.c is moved to src/backend/nodes/, to stick with the
other things for node equal/read/write/copy, renamed to
jumblefuncs.c.
- gen_node_support.c is extended to generate the functions and the
switch for the jumbling.  There are a few exceptions, as of the Lists
and RangeTblEntry to do the jumbling consistently.
- Two pg_node_attr() are added in consistency with the existing ones:
no_jumble to discard completely a node from the the query jumbling
and jumble_ignore to discard one field from the jumble.

The patch is in a rather good shape, passes check-world and the CI,
but there are a few things that need to be discussed IMO.  Things
could be perhaps divided in more patches, now the areas touched are
quite different so it did not look like a big deal to me as the
changes touch different areas and are straight-forward.

The location of the Nodes is quite invasive because we only care about
that for T_Const now in the query jumbling, and this could be
compensated with a third pg_node_attr() that tracks for the "int
location" of a Node whether it should participate in the jumbling or
not.  There is also an argument where we would want to not include by
default new fields added to a Node, but that would require added more
pg_node_attr() than what's done here.

Note that the plan is to extend the normalization to some other parts
of the Nodes, like CALL and SET as mentioned on the other thread.  I
have done nothing about that yet but doing so can be done in a few
lines with the facility presented here (aka just add a location
field).  Hence, the normalization is consistent with the existing
queryjumble.c for the fields and the nodes processed.

In this patch, things are done so as the query ID is not computed
anymore from the query string but from the Query.  I still need to
study the performance impact of that with short queries.  If things
prove to be noticeable in some cases, this stuff could be switched to
use a new GUC where we could have a code path for the computation of
utilityStmt using its string as a fallback.  I am not sure that I want
to enter in this level of complications, though, to keep things
simple, but that's yet to be done.

A bit more could be cut but pg_ident got in the way..  There are also
a few things for pg_stat_statements where a query ID of 0 can be
implied for utility statements in some cases.

Generating this code leads to an overall removal of code as what
 queryjumble.c is generated automatically:
 13 files changed, 901 insertions(+), 1113 deletions(-)

I am adding that to the next commit fest.

Thoughts?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Force streaming every change in logical decoding
Next
From: Alvaro Herrera
Date:
Subject: Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row