Re: Generating code for query jumbling through gen_node_support.pl - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Generating code for query jumbling through gen_node_support.pl
Date
Msg-id Y8eZ9/3STyhF57RV@paquier.xyz
Whole thread Raw
In response to Re: Generating code for query jumbling through gen_node_support.pl  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Generating code for query jumbling through gen_node_support.pl
List pgsql-hackers
On Tue, Jan 17, 2023 at 04:52:28PM +0900, Michael Paquier wrote:
> On Tue, Jan 17, 2023 at 08:43:44AM +0100, Peter Eisentraut wrote:
>> Ok, I understand now, and I agree with this approach over the opposite. I
>> was confused because the snippet you showed above used "jumble_ignore", but
>> your patch is correct as it uses "jumble_location".
>
> Okay.  I'll refresh the patch set so as we have only "jumble_ignore",
> then, like v1, with preparatory patches for what you mentioned and
> anything that comes into mind.

This is done as of the patch series v3 attached:
- 0001 reformats all the comments of the nodes.
- 0002 moves the current files for query jumble as of queryjumble.c ->
queryjumblefuncs.c and utils/queryjumble.h -> nodes/queryjumble.h.
- 0003 is the core feature, where I have done a second pass over the
nodes to make sure that things map with HEAD, incorporating the extra
docs coming from v2, adding a bit more.

>> That said, the term "jumble" is really weird, because in the sense that we
>> are using it here it means, approximately, "to mix together", "to unify".
>> So what we are doing with the Const nodes is really to *not* jumble the
>> location, but for all other node types we are jumbling the location.  At
>> least that is my understanding.
>
> I am quite familiar with this term, FWIW.  That's what we've inherited
> from the days where this has been introduced in pg_stat_statements.

I have renamed the node attributes to query_jumble_ignore and
no_query_jumble at the end, after considering Peter's point that only
"jumble" could be fuzzy here.  The file names are changed in
consequence.

While doing all that, I have done some micro-benchmarking of
JumbleQuery(), making it loop 50M times on my laptop each time a query
ID is computed (hideous hack with a loop in queryjumble.c):
- For non-utility queries, aka queries that go through
JumbleQueryInternal(), I am measuring a repeatable ~10% improvement
with the generated code over HEAD, which is kind of nice.  I have
tested a few DMLs and simple SELECTs, still it looks like a trend.
- For utility queries, the new code is competing against
hash_any_extended() with the query string, which is going to be hard
to beat..  I am measuring what looks like a 5 times slowdown, at
least, and more depending on the depth of the query tree.  That's not
surprising.  On a 50M loop, this comes down to compare a computation
of 100ns to 5ns for a 20-time slowdown for example, still this could
justify the addition of a GUC to control whether utility queries have
their query ID compiled depending on their nodes or their string
hash, as this could become noticeable in OLTP workloads with loads
of short statements and BEGIN/COMMIT queries?

Thoughts or comments?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl
Next
From: Peter Smith
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)