Re: Refactor query normalization into core query jumbling - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Refactor query normalization into core query jumbling
Date
Msg-id CAA5RZ0tKhUXQcyqOqKaBXfmjMZnYVkx44=3DHneomRuBBsZ4bA@mail.gmail.com
Whole thread Raw
In response to Re: Refactor query normalization into core query jumbling  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
> > This way, any extension that wishes to return a normalized string from
> > the same JumbleState can invoke this callback and get consistent results.
> > pg_stat_statements and other extensions with a need to normalize a query
> > string based on the locations of a JumbleState do not need to care about the
> > internals of normalization, they simply invoke the callback and
> > receive the final
> > string.
>
> Hmm.  I did not wrap completely my head with your problem, but,
> assuming that what you are proposing goes in the right direction,

The first goal is to move all query-normalization-related infrastructure
that pg_stat_statements (and other extensions) rely on into core, so
extensions no longer need to copy or reimplement normalization logic and
can all depend on a single, shared implementation.

In addition, query normalization necessarily modifies JumbleState (to
record constant locations and lengths). This responsibility should not
fall to extensions and should instead be delegated to core. I will argue
that the current design, in which extensions handle this directly, is a
layering violation.

As a first step, we can move generate_normalized_query to core as a global
function, allowing extensions to simply call it.

> I am wondering if we should not expose a bit more the jumble query APIs so
> as the normal default callback can be reused by out-of-core rather
> than hide it entirely.  This would mean exposing
> GenerateNormalizedQuery(), which also giving a way for callers of
> JumbleQuery() to pass down a custom callback?  This would imply
> thinking harder about the initialization state we expect in the
> structure, but I think that we should try to design things so as
> extensions do not need to copy-paste more code from the core tree at
> the end, just less of it.

... and this will be taking the next step which is providing callbacks
and making
more jumbling utilities global. This will require more discussion, but I
would think we would expose InitJumble() and it will do the bare minimum
to initialize a JumbleState, and some fields that can define callbacks after
the fact. There will be a callback for a normalization function and a
callback function that will allow the user to implement jumbling functions
for nodes that are currently not included in queryjumblefuncs.switch.c, or
perhaps they can override the existing logic in this generated file.

> Of course, this sentence is written with the same line of thoughts as
> previously mentioned in the other thread we have discussed: extensions
> should not be allowed to update a JumbleState after it's been set by
> the backend code, so as once the same JumbleState pointer is passed
> down across multiple extensions they don't get confused.  If an
> extension wants to use their own policy within the JumbleState, they
> had better recreate a new independent one if they are unhappy about
> has been generated previously.

Yes, correct. If we provide the interface to create an additional JumbleState,
they can create an independent state.

For this thread, I would like to focus on the first goal.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PATCH] Add memory usage reporting to VACUUM VERBOSE
Next
From: Jaime Casanova
Date:
Subject: Re: Postgres Patch Review Workshop: January 2026