Thread: Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Lukas Fittl

Date:

31 January, 08:19:49

On Thu, Jan 30, 2025 at 8:37 PM Michael Paquier <michael@paquier.xyz> wrote:

After thinking more about this one, I still want this toy and hearing
nothing I have applied it, with a second commit for the addition in
injection_points to avoid multiple bullet points in a single commit.

Thanks for committing! I had intended to review/test your patch, but the earlier parts of the week got way too busy.

I think the API with do_drop makes sense, and whilst I'd think there is some extra overhead to calling the function vs having an inline check for kind, it seems unlikely this would be used in a performance critical context, and the flexibility seems useful.

I have noticed post-commit that I have made a mistake in the credits
of a632cd354d35 and ce5c620fb625 for your family name. Really sorry
about that! This mistake is on me..

No worries regarding the name, happens to me all the time :)

> What do you think?

Attached is a rebased version of the three remaining patches. While
looking at this stuff, I have noticed an extra cleanup that would be
good to have, as a separate change: we could reformat a bit the plan
header comments so as these do not require a rewrite when adding
node_attr to them, like d575051b9af9.

Yeah, I think that'd be helpful to move the comments before the fields - it definitely gets hard to read.

Sami's patch set posted at [1] has the same problem, making the
proposals harder to parse and review, and the devil is in the details
with these pg_node_attr() properties attached to the structures. That
would be something to do on top of the proposed patch sets. Would any
of you be interested in that?

I'd be happy to tackle that - were you thinking to simply move any comments before the field, in each case where we're adding an annotation?

Separately I've been thinking how we could best have a discussion/review on whether the jumbling of specific plan struct fields is correct. I was thinking maybe a quick wiki page could be helpful, noting why to jumble/not jumble certain fields?

Thanks,

Lukas

Lukas Fittl

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Sami Imseih

Date:

05 February, 02:14:48

>> Separately I've been thinking how we could best have a discussion/review on
>> whether the jumbling of specific plan struct fields is correct. I was
>> thinking maybe a quick wiki page could be helpful, noting why to jumble/not
>> jumble certain fields?

> Makes sense.  This is a complicated topic.

+1 for the Wiki page

I started looking at the set of patches and started with v3-0001.
For that one, I think we need to refactor a bit more for
maintainability/readability.

queryjumblefuncs.c now has dual purposes which is the generic node jumbling
code and now it also has the specific query jumbling code. That seems wrong
from a readability/maintainability perspective.

Here are my high-level thoughts on this:
1. rename queryjumblefuncs.c to jumblefuncs.c
2. move the query jumbling related code to parser/analyze.c,
since query jumbling occurs there during parsing.
3. Rewrite the comments in the new jumblefuncs.c to
make it clear the intention of this infrastructure; that
it is used to jumble nodes for query or plan trees.

I can work on this if you agree.

Regards,

Sami

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Sami Imseih

Date:

10 February, 23:14:09

Another thought that I have is that If we mention that extensions can use
these jumbling ( or whatever the final name is ) functions outside of
core, it makes
sense to actually show an example of this. What do you think?

-- 
Sami

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Julien Rouhaud

Date:

12 February, 04:20:53

On Wed, Feb 12, 2025 at 09:08:00AM +0900, Michael Paquier wrote:
> Wikipedia seems to agree with you that "fingerprint" would fit for
> this purpose, though:
> https://en.wikipedia.org/wiki/Fingerprint_(computing)
>
> Has anybody any comments about that?  That would be a large renaming,
> but in the long term is makes sense if we want to apply that to more
> than just parse nodes and query strings.  If you do that, it impacts
> the file names and the properties, that are hidden in the backend for
> most of it, except the entry API and JumbleState.  This last part
> impacts some extensions and I have been maintaining one a bit
> (pg_hint_plan).

I agree that fingerprint is a good improvement.

>
> Also adding Julien in CC,
> as he has some out-of-core extension code that depends on the jumbling
> structures if I recall correctly.

I do have an extension to support custom fingerprinting logic, but the
introduction of the pg_node_attr based jumbling kind of broke it.

FTR my main motivation was to be able to deal with queries referencing
temporary relations, as if your application creates a lot of those it basically
means that you cannot use pg_stat_statements anymore.

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Julien Rouhaud

Date:

12 February, 05:22:53

On Wed, Feb 12, 2025 at 10:59:04AM +0900, Michael Paquier wrote:
> On Wed, Feb 12, 2025 at 09:20:53AM +0800, Julien Rouhaud wrote:
> >
> > FTR my main motivation was to be able to deal with queries referencing
> > temporary relations, as if your application creates a lot of those it basically
> > means that you cannot use pg_stat_statements anymore.
>
> Do you have an issue more details about your problem?  If we can
> improve the situation in core without impacting the existing cases
> that we need to support in pgss, that may be worth looking at.

I thought this was a well known limitation.  The basic is that if you rely on
temp tables, you usually end up with a virtually infinite number of queryids
since all temp tables get a different oid and that oid is used in the queryid
computation.  And in that case the overhead of pg_stat_statements is insanely
high.  The last figures I saw was by Andres many years ago, with a mention 40%
overhead, and I don't think it's hard to get way worse overhead than that if
you have lengthier query texts.

As a prototype in my extension I think I just entirely ignored such queries,
but another (and probably friendlier for the actual pg_stat_statements
statistics) approach would be to use the relation name to compute the queryid
rather than its oid.  This would add some overhead, but I think it would have
very limited impact especially compared to the current situation.

Of course some people may want to keep the current behavior, if they have
limited number of temp tables or similar, so I had a GUC for that.  I don't
think that the community would really welcome such GUC for core-postgres,
especially since it wouldn't be pg_stat_statements specific.

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Greg Sabino Mullane

Date:

12 February, 05:54:33

On Tue, Feb 11, 2025 at 7:08 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Feb 10, 2025 at 02:02:10PM -0600, Sami Imseih wrote:
> I am OK with moving away from "jumble" in-lieu of something else, but my thoughts are we should actually call this process "fingerprint"

I agree fingerprint is the right final word. But "jumble" conveys the *process* better than "fingerprinting". I view it as jumbling produces an object that can be fingerprinted.

> For node attributes we can specify "fingerprint_ignore" or "no_fingerprint". What do you think?

Still should be jumble_ignore.

Cheers,

Greg

Crunchy Data - https://www.crunchydata.com

Enterprise Postgres Software Products & Tech Support

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Sami Imseih

Date:

12 February, 05:57:46

> Of course some people may want to keep the current behavior, if they have
> limited number of temp tables or similar, so I had a GUC for that.  I don't
> think that the community would really welcome such GUC for core-postgres,
> especially since it wouldn't be pg_stat_statements specific.

FWIW, I think options to tweak queryId computation is something
that should be in core. It was discussed earlier in the context
of IN list merging; the patch for this currently has the guc
for the feature in pg_stat_statements, but there was a discussion
about actually moving this to core [1] Giving the user a way
to control certain behavior about the queryId computation
is a good thing to do in core; especially queryId is no longer
just consumed in pg_stat_statements. Maybe the right answer
is an enum GUC, not sure yet.

Specifically for the use-case you mention, using names vs OIDs in
queryId computation is a valid use case for more than temporary tables,
I can also think of upgrade, dump/restore, logical replication cases which
can then allow for a consistent queryId.

[1] https://www.postgresql.org/message-id/202502111852.btskmr7nhien%40alvherre.pgsql

-- 
Sami

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Julien Rouhaud

Date:

12 February, 06:50:47

On Tue, Feb 11, 2025 at 08:57:46PM -0600, Sami Imseih wrote:
> > Of course some people may want to keep the current behavior, if they have
> > limited number of temp tables or similar, so I had a GUC for that.  I don't
> > think that the community would really welcome such GUC for core-postgres,
> > especially since it wouldn't be pg_stat_statements specific.
>
> FWIW, I think options to tweak queryId computation is something
> that should be in core. It was discussed earlier in the context
> of IN list merging; the patch for this currently has the guc
> for the feature in pg_stat_statements, but there was a discussion
> about actually moving this to core [1] Giving the user a way
> to control certain behavior about the queryId computation
> is a good thing to do in core; especially queryId is no longer
> just consumed in pg_stat_statements. Maybe the right answer
> is an enum GUC, not sure yet.
>
> Specifically for the use-case you mention, using names vs OIDs in
> queryId computation is a valid use case for more than temporary tables,
> I can also think of upgrade, dump/restore, logical replication cases which
> can then allow for a consistent queryId.

Well, the ability for extensions to override the actual queryid calculation was
the result of more than half a decade of strong disagreements about it.   And
I'm definitely not volunteering to reopen that topic :)

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Alvaro Herrera

Date:

12 February, 15:57:47

On 2025-Feb-12, Julien Rouhaud wrote:

> > FWIW, I think options to tweak queryId computation is something that
> > should be in core. It was discussed earlier in the context of IN
> > list merging; the patch for this currently has the guc for the
> > feature in pg_stat_statements, but there was a discussion about
> > actually moving this to core [1] Giving the user a way to control
> > certain behavior about the queryId computation is a good thing to do
> > in core; especially queryId is no longer just consumed in
> > pg_stat_statements. Maybe the right answer is an enum GUC, not sure
> > yet.

> Well, the ability for extensions to override the actual queryid
> calculation was the result of more than half a decade of strong
> disagreements about it.   And I'm definitely not volunteering to
> reopen that topic :)

Sorry, Michael already did.

Anyway, I think that's different.  We do support compute_query_id=off as
a way for a custom module to compute completely different query IDs
using their own algorithm, which I think is what you're referring to.
However, the ability to affect the way the in-core algorithm works is a
different thing: you still want in-core code to compute the query ID.

Right now, the proposal in the other thread is that if you want to
affect that algorithm in order to merge arrays to be considered a single
query element regardless of its length, you set the GUC for that.
Initially the GUC was in the core code.  Then, based on review, the GUC
was moved to the extension, _BUT_ the implementation was still in the
core code: in order to activate it, the extension calls a function that
modifies core code behavior.  So there are more moving parts than
before, and if you for whatever reason want that behavior but not the
extension, then you need to write a C function.  To me this is absurd.
So what I suggest we do is return to having the GUC in the core code.

Now I admit I'm not sure what the solution would be for the problem
discussed in this subthread.  Apparently the problem is related to temp
tables and their changing OIDs.  I'm not sure what exactly the proposal
for a GUC is.  I mean, what would the behavior change be?  Maybe what
you want is something like "if this table reference here is to a temp
table, then instead of jumbling the OID then jumble the string
'pg_temp.tablename' instead", which would make the query ID be the same
for all occurrences of that query in whatever backend return the same
number, regardless both of what OID the temp schema for that backend is,
and the table OID itself.  Is there more to it than that?  (The only
difficulty I see here is how to get the table name when the only thing
you have is the RangeTblEntry, which doesn't have the name but just the
OID.  I see in [1] that you simply do a syscache lookup, but it would be
good to avoid that.)

Maybe that sounds pretty obscure if you try to describe it too
precisely, but if you don't think too hard about it it probably natural
-- at least to me.  So my next question is, do we really need this
behavior to be configurable?  Wouldn't it be better to make the default
way to deal with temp tables in all cases?  The current behavior seems
rather unhelpful.  I do note that what you do in pg_queryid, which is
simply to ignore the table altogether, is probably not a great idea.

Anyway, assuming we make a GUC of it (a big if!), let me talk a bit
about GUC names.  In the other thread, the list of GUC names in the
submitted patch plus the ones I suggested are:

query_id_const_merge
query_id_merge_values
query_id_merge_value_lists
query_id_squash_constant_lists

so maybe here I would consider something like

query_id_merge_temp_tables
query_id_squash_temporary_tables

[1] https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L941

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."    (Freeman Dyson)

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Julien Rouhaud

Date:

12 February, 16:46:06

On Wed, Feb 12, 2025 at 01:57:47PM +0100, Alvaro Herrera wrote:
> On 2025-Feb-12, Julien Rouhaud wrote:
>
> > > FWIW, I think options to tweak queryId computation is something that
> > > should be in core. It was discussed earlier in the context of IN
> > > list merging; the patch for this currently has the guc for the
> > > feature in pg_stat_statements, but there was a discussion about
> > > actually moving this to core [1] Giving the user a way to control
> > > certain behavior about the queryId computation is a good thing to do
> > > in core; especially queryId is no longer just consumed in
> > > pg_stat_statements. Maybe the right answer is an enum GUC, not sure
> > > yet.
>
> > Well, the ability for extensions to override the actual queryid
> > calculation was the result of more than half a decade of strong
> > disagreements about it.   And I'm definitely not volunteering to
> > reopen that topic :)
>
> Anyway, I think that's different.  We do support compute_query_id=off as
> a way for a custom module to compute completely different query IDs
> using their own algorithm, which I think is what you're referring to.
> However, the ability to affect the way the in-core algorithm works is a
> different thing: you still want in-core code to compute the query ID.

I don't think that's the actual behavior, or at least not what it was supposed
to be.

What we should have is the ability to compute queryid, which can be either in
core or done by an external module, but one only one can / should be done.  And
then you have stuff that use that queryid, e.g. pg_stat_statements,
pg_stat_activity and whatnot, no matter what generated it.  That's per the
original commit 5fd9dfa5f50e message:

Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default).  It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.

To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.

> Right now, the proposal in the other thread is that if you want to
> affect that algorithm in order to merge arrays to be considered a single
> query element regardless of its length, you set the GUC for that.
> Initially the GUC was in the core code.  Then, based on review, the GUC
> was moved to the extension, _BUT_ the implementation was still in the
> core code: in order to activate it, the extension calls a function that
> modifies core code behavior.  So there are more moving parts than
> before, and if you for whatever reason want that behavior but not the
> extension, then you need to write a C function.  To me this is absurd.
> So what I suggest we do is return to having the GUC in the core code.

I agree, although that probably breaks the queryid extensibility.  I haven't
read the patch but IIUC if you want the feature to work you need to both change
the queryid calculation but also the way the constants are recorded and the
query text is normalized, and I don't know if extensions have access to it.  If
they have access and fail to do what the GUC asked then of course that's just a
bug in that extension.

> Now I admit I'm not sure what the solution would be for the problem
> discussed in this subthread.  Apparently the problem is related to temp
> tables and their changing OIDs.  I'm not sure what exactly the proposal
> for a GUC is.

I'm not proposing anything, just explaining why pg_stat_statements is generally
useless if you use temp tables as someone asked.

> I do note that what you do in pg_queryid, which is
> simply to ignore the table altogether, is probably not a great idea.

Yeah, that's also why I said in my previous message that using its name for the
queryid would be better.  Note that in pg_queryid it's already possible to use
relation names rather than oid for the queryid (which I wouldn't recommend, but
it's good for testing).  I just never implemented it for a temp-only
granularity.

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Alvaro Herrera

Date:

12 February, 19:00:19

On 2025-Feb-12, Julien Rouhaud wrote:

> On Wed, Feb 12, 2025 at 01:57:47PM +0100, Alvaro Herrera wrote:

> > Anyway, I think that's different.  We do support compute_query_id=off as
> > a way for a custom module to compute completely different query IDs
> > using their own algorithm, which I think is what you're referring to.
> > However, the ability to affect the way the in-core algorithm works is a
> > different thing: you still want in-core code to compute the query ID.
> 
> I don't think that's the actual behavior, or at least not what it was
> supposed to be.
> 
> What we should have is the ability to compute queryid, which can be
> either in core or done by an external module, but one only one can /
> should be done.

Yes, that's what I tried to say, but I don't understand why you say I
said something different.

> > Right now, the proposal in the other thread is that if you want to
> > affect that algorithm in order to merge arrays to be considered a single
> > query element regardless of its length, you set the GUC for that.
> > Initially the GUC was in the core code.  Then, based on review, the GUC
> > was moved to the extension, _BUT_ the implementation was still in the
> > core code: in order to activate it, the extension calls a function that
> > modifies core code behavior.  So there are more moving parts than
> > before, and if you for whatever reason want that behavior but not the
> > extension, then you need to write a C function.  To me this is absurd.
> > So what I suggest we do is return to having the GUC in the core code.
> 
> I agree, although that probably breaks the queryid extensibility.

It does?

> I haven't read the patch but IIUC if you want the feature to work you
> need to both change the queryid calculation but also the way the
> constants are recorded and the query text is normalized, and I don't
> know if extensions have access to it.

Hmm.  As for the query text: with Andrey's feature with the GUC in core,
a query like this
SELECT foo FROM tab WHERE col1 IN (1,2,3,4)
will have in pg_stat_activity an identical query_id to a query like this
SELECT foo WHERE tab WHERE col1 IN (1,2,3,4,5)
even though the query texts differ (in the number of elements in the
array).  I don't think this is a problem.  This means that the query_id
for two different queries can be identical, but that should be no
surprise, precisely because the GUC that controls it is documented to do
that.

If pg_stat_statements is enabled with Andrey's patch, then the same
query_id will have a single entry (which has stats for both execution of
those queries) with that query_id, with a normalized query text that is
going to be different from those two above; without Andrey's feature,
the text would be 
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4);
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4,$5);
(that is, pg_stat_statements transformed the values into placeholders,
but using exactly the same number of items in the array as the original
queries).  With Andrey's feature, it will be 
SELECT foo WHERE tab WHERE col1 IN (...);
that is, the query text has been modified and no longer matches exactly
any of the queries in pg_stat_activity.  But note that the query text
already does not match what's in pg_stat_activity, even before Andrey's
patch.

I don't understand what you mean with "the way the constants are
recorded".  What constants are you talking about?  pg_stat_statements
purposefully discards any constants used in the query (obviously).

> If they have access and fail to do what the GUC asked then of course
> that's just a bug in that extension.

I don't understand what bug are you thinking that such hypothetical
extension would have.  (pg_stat_statements does of course have access to
the query text and to the location of all constants).

> > Now I admit I'm not sure what the solution would be for the problem
> > discussed in this subthread.  Apparently the problem is related to temp
> > tables and their changing OIDs.  I'm not sure what exactly the proposal
> > for a GUC is.
> 
> I'm not proposing anything, just explaining why pg_stat_statements is
> generally useless if you use temp tables as someone asked.

Ah, okay.  Well, where you see a deficiency, I see an opportunity for
improvement :-)

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Sami Imseih

Date:

13 February, 05:50:08

>> On Mon, Feb 10, 2025 at 02:02:10PM -0600, Sami Imseih wrote:
>> > I am OK with moving away from "jumble" in-lieu of something else, but my thoughts are we should actually call this
process"fingerprint"

>
>
> I agree fingerprint is the right final word. But "jumble" conveys the *process* better than "fingerprinting".
> I view it as jumbling produces an object that can be fingerprinted.

hmm, "jumble" describes something that is scrambled
or not in order, such as the 64-bit hash produced. It
sounds like the final product.

Fingerprinting on the other hand [1] sounds more of the process
to add all the pieces that will eventually be hashed ( or jumbled ).
hash and jumble are synonyms according to Merriam-Webster [2]

--

Sami

[1] https://en.wikipedia.org/wiki/Fingerprint_(computing)
[2] https://www.merriam-webster.com/thesaurus/hash

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Alvaro Herrera

Date:

13 February, 14:25:25

On 2025-Feb-12, Sami Imseih wrote:

> Greg S. Mullane wrote:
>
> > I agree fingerprint is the right final word. But "jumble" conveys
> > the *process* better than "fingerprinting".  I view it as jumbling
> > produces an object that can be fingerprinted.
> 
> hmm, "jumble" describes something that is scrambled
> or not in order, such as the 64-bit hash produced. It
> sounds like the final product.

I don't understand why we would change any naming here at all.  I think
you should be looking at a much broader consensus and plus-ones that a
renaming is needed.  -1 from me.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Sami Imseih

Date:

13 February, 19:44:33

> I don't understand why we would change any naming here at all.  I think
> you should be looking at a much broader consensus and plus-ones that a
> renaming is needed.  -1 from me.

The reason for the change is because "query jumble" will no longer
make sense if the jumble code can now be used for other types of
trees, such as Plan.

I do agree that this needs a single-threaded discussion to achieve a
consensus.

--

Sami

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Michael Paquier

Date:

18 March, 10:31:06

On Thu, Feb 13, 2025 at 10:44:33AM -0600, Sami Imseih wrote:
> The reason for the change is because "query jumble" will no longer
> make sense if the jumble code can now be used for other types of
> trees, such as Plan.
>
> I do agree that this needs a single-threaded discussion to achieve a
> consensus.

FWIW, I was playing with a sub-project where I was jumbling a portion
of nodes other than Query, and it is annoying to not have a direct
access to jumbleNode().  So, how about doing the refactoring proposed
in v5-0002 with an initialization routine and JumbleNode() as the
entry point for the jumbling, but not rename the existing files
queryjumblefuncs.c and queryjumble.h?  That seems doable for this
release, at least.

I don't think that we should expose AppendJumble(), either.
--
Michael

Attachment

signature.asc

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Andrei Lepikhov

Date:

18 March, 12:12:10

On 3/18/25 08:31, Michael Paquier wrote:
> On Thu, Feb 13, 2025 at 10:44:33AM -0600, Sami Imseih wrote:
>> The reason for the change is because "query jumble" will no longer
>> make sense if the jumble code can now be used for other types of
>> trees, such as Plan.
>>
>> I do agree that this needs a single-threaded discussion to achieve a
>> consensus.
> 
> FWIW, I was playing with a sub-project where I was jumbling a portion
> of nodes other than Query, and it is annoying to not have a direct
> access to jumbleNode().  So, how about doing the refactoring proposed
> in v5-0002 with an initialization routine and JumbleNode() as the
> entry point for the jumbling, but not rename the existing files
> queryjumblefuncs.c and queryjumble.h?  That seems doable for this
> release, at least.
It seems pretty helpful to me. Having a code for hashing an expression 
or subquery, we may design new optimisations. I personally have such a 
necessity in a couple of planner extensions.

At the same time, generalising jumbling code we may decide to work on 
the JumbleState structure: code related to constant locations may be 
replaced with callbacks - let the caller decide what action to take on 
each node (not only constants). Of course, it is not for current release.

> 
> I don't think that we should expose AppendJumble(), either.
Agree


-- 
regards, Andrei Lepikhov

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

From

Ilia Evdokimov

Date:

18 March, 22:27:47

On 12.02.2025 19:00, Alvaro Herrera wrote:
> On 2025-Feb-12, Julien Rouhaud wrote:
>
>> On Wed, Feb 12, 2025 at 01:57:47PM +0100, Alvaro Herrera wrote:
>>> Anyway, I think that's different.  We do support compute_query_id=off as
>>> a way for a custom module to compute completely different query IDs
>>> using their own algorithm, which I think is what you're referring to.
>>> However, the ability to affect the way the in-core algorithm works is a
>>> different thing: you still want in-core code to compute the query ID.
>> I don't think that's the actual behavior, or at least not what it was
>> supposed to be.
>>
>> What we should have is the ability to compute queryid, which can be
>> either in core or done by an external module, but one only one can /
>> should be done.
> Yes, that's what I tried to say, but I don't understand why you say I
> said something different.
>
>>> Right now, the proposal in the other thread is that if you want to
>>> affect that algorithm in order to merge arrays to be considered a single
>>> query element regardless of its length, you set the GUC for that.
>>> Initially the GUC was in the core code.  Then, based on review, the GUC
>>> was moved to the extension, _BUT_ the implementation was still in the
>>> core code: in order to activate it, the extension calls a function that
>>> modifies core code behavior.  So there are more moving parts than
>>> before, and if you for whatever reason want that behavior but not the
>>> extension, then you need to write a C function.  To me this is absurd.
>>> So what I suggest we do is return to having the GUC in the core code.
>> I agree, although that probably breaks the queryid extensibility.
> It does?
>
>> I haven't read the patch but IIUC if you want the feature to work you
>> need to both change the queryid calculation but also the way the
>> constants are recorded and the query text is normalized, and I don't
>> know if extensions have access to it.
> Hmm.  As for the query text: with Andrey's feature with the GUC in core,
> a query like this
> SELECT foo FROM tab WHERE col1 IN (1,2,3,4)
> will have in pg_stat_activity an identical query_id to a query like this
> SELECT foo WHERE tab WHERE col1 IN (1,2,3,4,5)
> even though the query texts differ (in the number of elements in the
> array).  I don't think this is a problem.  This means that the query_id
> for two different queries can be identical, but that should be no
> surprise, precisely because the GUC that controls it is documented to do
> that.
>
> If pg_stat_statements is enabled with Andrey's patch, then the same
> query_id will have a single entry (which has stats for both execution of
> those queries) with that query_id, with a normalized query text that is
> going to be different from those two above; without Andrey's feature,
> the text would be
> SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4);
> SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4,$5);
> (that is, pg_stat_statements transformed the values into placeholders,
> but using exactly the same number of items in the array as the original
> queries).  With Andrey's feature, it will be
> SELECT foo WHERE tab WHERE col1 IN (...);
> that is, the query text has been modified and no longer matches exactly
> any of the queries in pg_stat_activity.  But note that the query text
> already does not match what's in pg_stat_activity, even before Andrey's
> patch.
>
> I don't understand what you mean with "the way the constants are
> recorded".  What constants are you talking about?  pg_stat_statements
> purposefully discards any constants used in the query (obviously).
>
>> If they have access and fail to do what the GUC asked then of course
>> that's just a bug in that extension.
> I don't understand what bug are you thinking that such hypothetical
> extension would have.  (pg_stat_statements does of course have access to
> the query text and to the location of all constants).
>
>>> Now I admit I'm not sure what the solution would be for the problem
>>> discussed in this subthread.  Apparently the problem is related to temp
>>> tables and their changing OIDs.  I'm not sure what exactly the proposal
>>> for a GUC is.
>> I'm not proposing anything, just explaining why pg_stat_statements is
>> generally useless if you use temp tables as someone asked.
> Ah, okay.  Well, where you see a deficiency, I see an opportunity for
> improvement :-)
>

Hi everyone,

I support the idea of computing the planid  for temporary tables using 
'pg_temp.rel_name'. Moreover, we have already started using this 
approach for computing queryid [0]. It seems reasonable to apply the 
same logic to the planid calculation as well.

[0]: 
https://www.postgresql.org/message-id/flat/Z9mkqplmUpQ4xG52%40msg.df7cb.de#efb20f01bec32aeafd58e5d4ab0dfc16

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.