Re: Refactor query normalization into core query jumbling - Mailing list pgsql-hackers

From Lukas Fittl
Subject Re: Refactor query normalization into core query jumbling
Date
Msg-id CAP53PkxqGbPw5VzpacyJb2wTofYJadCoUmxV8s2o5tHzKznwbg@mail.gmail.com
Whole thread Raw
In response to Re: Refactor query normalization into core query jumbling  (Sami Imseih <samimseih@gmail.com>)
Responses Re: Refactor query normalization into core query jumbling
List pgsql-hackers
On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih <samimseih@gmail.com> wrote:
> that would be pushed across more stacks of the
> post-parse-analyze hook.  Doing that would allow us to pull the plug
> into making JumbleState and the original copies of clocations a set of
> consts when given to extensions.

Yes correct. The hook signature will turn into, so all extensions now
just have a constant pointer to the jumblestate. They can of course
work hard to cast away constants, but at that point, they are doing
things the wrong way.

```
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,

        Query *query,

        const JumbleState *jstate);
```

This of course will be a big breaking change to all the extensions out
there using this hook. This is not just about a signature change of
the hook, but an author now has to figure out how to deal with a
constant JumbleState in their code, which may not be trivial.

I wonder if there is a single extension out there that actually utilizes JumbleState in that hook -
it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because pg_stat_statements
needed it, but even Julien at the time was critical of adding it, mainly considering it for pgss needs if I read
the archive correctly [0]. CCing Julien to correct me if I misinterpret the historic record here.

Reading through the discussion in this thread here I do wonder a bit if we shouldn't just take that out of the
hook again, and instead provide separate functions for getting the normalized query string (generating it the
first time its called).

My proposal in v3 was a bit more softer, and keeps the hook
backwards compatible. Basically, we keep JumbleState a non-constant,
but provide core APIs for any  operation, such as generate_normalized_query,
that needs to modify the state. So, my approach was not about enforcing a
read-only JumbleState, but about providing the API to dissuade an author
from modifying a JumbleState.

Given the lack of public APIs to modify JumbleState today, I don't see how an extension would do
modifications in a meaningful way, short of copying the code. I think we should be a bit bolder here in
enforcing a convention, either clearly making it read-only or dropping the argument again.

Thinking through a couple of theoretical use cases:

1) An extension that wants to display normalized query strings

This seems to be the biggest kind of what I can find with code search. Extensions like pg_stat_monitor [1], that
want to do something like pg_stat_statements, and have to copy a bunch of normalization code today that is 1:1 what
Postgres does. Such extensions don't need the JumbleState argument if they can get the normalized text directly.

2) An extension that wants to capture parameter values

Some extensions may want to remember additional context for normalized queries, like pg_tracing's logic for
optionally passing parameter values (post normalization) in the trace context [2]. If we kept passing a read-only 
JumbleState then such extensions could presumably still get this, but I wonder if it wouldn't be better for core to
have a helper for this?

3) An extension that modifies the JumbleState to cause different normalization to happen

I'm not aware of any extensions doing this, and I couldn't find any either.  And since such theoretical extensions
can directly modify query->queryId in the same hook, why not do that?

4) Extensions using the presence of JumbleState to determine whether query IDs are available (?)

I noticed pg_hint_plan checks the JumbleState argument to determine whether or not to run an early check
of the hint logic. I'm a little confused by this (and if this is about query IDs being available, couldn't it just check the
Query having a non-zero queryID?), but FWIW: [3]


Thanks,
Lukas

--
Lukas Fittl

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Allow complex data for GUC extra.
Next
From: Henson Choi
Date:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks