pgsql: Generate code for query jumbling through gen_node_support.pl - Mailing list pgsql-committers

From Michael Paquier
Subject pgsql: Generate code for query jumbling through gen_node_support.pl
Date
Msg-id E1pMk51-000puf-55@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Generate code for query jumbling through gen_node_support.pl

This commit changes the query jumbling code in queryjumblefuncs.c to be
generated automatically based on the information of the nodes in the
headers of src/include/nodes/ by using gen_node_support.pl.  This
approach offers many advantages:
- Support for query jumbling for all the utility statements, based on the
state of their parsed Nodes and not only their query string.  This will
greatly ease the switch to normalize the information of some DDLs, like
SET or CALL for example (this is left unchanged and should be part of a
separate discussion).  With this feature, the number of entries stored
for utilities in pg_stat_statements is reduced (for example now
"CHECKPOINT" and "checkpoint" mean the same thing with the same query
ID).
- Documentation of query jumbling directly in the structure definition
of the nodes.  Since this code has been introduced in pg_stat_statements
and then moved to code, the reasons behind the choices of what should be
included in the jumble are rather sparse.  Note that some explanation is
added for the most relevant parts, as a start.
- Overall code reduction and more consistency with the other parts
generating read, write and copy depending on the nodes.

The query jumbling is controlled by a couple of new node attributes,
documented in nodes/nodes.h:
- custom_query_jumble, to mark a Node as having a custom
implementation.
- no_query_jumble, to ignore entirely a Node.
- query_jumble_ignore, to ignore a field in a Node.
- query_jumble_location, to mark a location in a Node, for
normalization.  This can apply only to int fields, with "location" in
their name (only Const as of this commit).

There should be no compatibility impact on pg_stat_statements, as the
new code applies the jumbling to the same fields for each node (its
regression tests have no modification, for one).

Some benchmark of the query jumbling between HEAD and this commit for
SELECT and DMLs has proved that this new code does not cause a
performance regression, with computation times close for both methods.
For utility queries, the new method is slower than the previous method
of calculating a hash of the query string, though we are talking about
extra ns-level changes based on what I measured, which is unnoticeable
even for OLTP workloads as a query ID is calculated once per query
post-parse analysis.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3db72ebcbe20debc6552500ee9ccb4b2007f12f8

Modified Files
--------------
.../expected/pg_stat_statements.out                |   3 +-
.../pg_stat_statements/sql/pg_stat_statements.sql  |   3 +-
src/backend/nodes/README                           |   1 +
src/backend/nodes/gen_node_support.pl              | 114 ++-
src/backend/nodes/meson.build                      |   2 +-
src/backend/nodes/queryjumblefuncs.c               | 788 ++++-----------------
src/include/nodes/bitmapset.h                      |   2 +-
src/include/nodes/nodes.h                          |  15 +-
src/include/nodes/parsenodes.h                     | 134 ++--
src/include/nodes/primnodes.h                      | 273 +++----
10 files changed, 503 insertions(+), 832 deletions(-)


pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: Remove recovery test 011_crash_recovery.pl
Next
From: Robins Tharakan
Date:
Subject: Re: pgsql: Make Vars be outer-join-aware.