Thread: Proposal - Allow extensions to set a Plan Identifier

Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

13 February, 02:44:20

Hi,

A patch by Lukas Fittl [1] introduces the pg_stat_plans extension, which
exposes execution plans along with execution statistics. As part of this work,
a new infrastructure is being proposed to compute a plan identifier (planId),
which is exposed in both pg_stat_plans and pg_stat_activity.

Exposing planId in pg_stat_activity enables users to identify long-running
or high-load plans and correlate them with plan text from the extension.

One of open question in [1] is which components of a plan tree should be
considered when computing planId. Lukas has created a wiki [2] for discussion,
but reaching a consensus may take time—possibly in time for v19, but
it's uncertain.

Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and
store the plan text, but they lack a way to expose planId in pg_stat_activity.
This limits their usefulness, as identifying top or long-running plans from
pg_stat_activity is critical for monitoring.

To address this, I propose leveraging work from [1] and allow extensions
to set a planId This could be implemented sooner than an in-core
computed planId.

Proposal Overview:

1/ Add a planId field to PgBackendStatus.
2/ Add a planId field to PlannedStmt.
3/ Create APIs for extensions to set the current planId.

Storing planId in PlannedStmt ensures it is available with
cached plans.

One consideration is whether reserving a 64-bit integer in PgBackendStatus
and PlannedStmt is acceptable, given that planId may not always be used.
However, this is already the case for queryId when compute_query_id is
disabled and no extension sets it, so it may not be a concern.

Looking forward to feedback on this approach.
If there is agreement, I will work on preparing the patches.

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAP53PkxocbNr%2BeRag3FEJp3-7S1U80FspOg8UQjO902TWMG%3D6A%40mail.gmail.com
[3] https://github.com/percona/pg_stat_monitor

Re: Proposal - Allow extensions to set a Plan Identifier

From

Bertrand Drouvot

Date:

13 February, 10:48:04

Hi,

On Wed, Feb 12, 2025 at 05:44:20PM -0600, Sami Imseih wrote:
> Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and
> store the plan text, but they lack a way to expose planId in pg_stat_activity.
> This limits their usefulness, as identifying top or long-running plans from
> pg_stat_activity is critical for monitoring.

Agree.

> To address this, I propose leveraging work from [1] and allow extensions
> to set a planId This could be implemented sooner than an in-core
> computed planId.

I think that makes sense and then the idea would be later on to move to something
like 5fd9dfa5f50, but for the "planId": is my understanding correct?

> Proposal Overview:
> 
> 1/ Add a planId field to PgBackendStatus.
> 2/ Add a planId field to PlannedStmt.
> 3/ Create APIs for extensions to set the current planId.

Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
through some hooks). But did not look close enough and would probably depends
of the implementation, so let's see.

> Storing planId in PlannedStmt ensures it is available with
> cached plans.
> 
> One consideration is whether reserving a 64-bit integer in PgBackendStatus
> and PlannedStmt is acceptable, given that planId may not always be used.

From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
is 152 bytes. It looks like that adding an uint64 at the end of those structs
would not add extra padding (that's what "ptype /o struct <struct>" tells me)
so would increase to 440 bytes and 160 bytes respectively. I don't think that's
an issue.

> However, this is already the case for queryId when compute_query_id is
> disabled and no extension sets it

Yup.

> Looking forward to feedback on this approach.
> If there is agreement, I will work on preparing the patches.

That sounds like a plan for me but better to wait for others to share their
thoughts too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

13 February, 20:10:27

Thanks for the feedback!

> I think that makes sense and then the idea would be later on to move to something
> like 5fd9dfa5f50, but for the "planId": is my understanding correct?

correct. This is adding infrastructure to eventually have an in-core
planId; but in the
meanwhile give extensions that already compute a planId the ability to
broadcast the planId in pg_stat_activity.

> > Proposal Overview:
> >
> > 1/ Add a planId field to PgBackendStatus.
> > 2/ Add a planId field to PlannedStmt.
> > 3/ Create APIs for extensions to set the current planId.
>
> Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
> through some hooks). But did not look close enough and would probably depends
> of the implementation, so let's see.

I don't think direct setting of values is a good idea. We will need an API
similar to pgstat_report_query_id which ensures we are only reporting top
level planIds -and- in the case of multiple extensions with the capability
to set a planId, only the first one in the stack wins. pgstat_report_query_id
does allow for forcing a queryId ( force flag which is false by default ), and
I think this new API should allow the same.

> > Storing planId in PlannedStmt ensures it is available with
> > cached plans.
> >
> > One consideration is whether reserving a 64-bit integer in PgBackendStatus
> > and PlannedStmt is acceptable, given that planId may not always be used.
>
> From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
> is 152 bytes. It looks like that adding an uint64 at the end of those structs
> would not add extra padding (that's what "ptype /o struct <struct>" tells me)
> so would increase to 440 bytes and 160 bytes respectively. I don't think that's
> an issue.

Correct, thanks for adding this detail.

--

Sami

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

15 February, 12:29:41

On 14/2/2025 08:21, Michael Paquier wrote:
> On Thu, Feb 13, 2025 at 11:10:27AM -0600, Sami Imseih wrote:
>> I don't think direct setting of values is a good idea. We will need an API
>> similar to pgstat_report_query_id which ensures we are only reporting top
>> level planIds -and- in the case of multiple extensions with the capability
>> to set a planId, only the first one in the stack wins. pgstat_report_query_id
>> does allow for forcing a queryId ( force flag which is false by default ), and
>> I think this new API should allow the same.
> I'm obviously siding with this proposal because we have an ask to
> track this kind of data, and because this kind of data is kind of hard
> to track across a stack of extensions through the core backend code.
> 
> Point is, would others be interested in this addition or just object
> to it because it touches the core code?
I have already implemented it twice in different ways as a core patch.
In my projects, we need to track queryId and plan node ID for two reasons:
1. Optimisational decisions made during transformation/path generation 
stages up to the end of execution to correct them in the future.
2. Cache information about the query tree/node state to use it for 
statistical purposes.
In my experience, we don't need a single plan_id field; we just need an 
'extended list' pointer at the end of the Plan, PlannedStmt, Query, and 
RelOptInfo structures and a hook at the end of the create_plan_recurse() 
to allow passing some info from the path generator to the plan tree.
An extension may add its data to the list (we may register an extensible 
node type to be sure we don't interfere with other extensions) and 
manipulate it in a custom way and with custom UI.
Generally, it makes the optimiser internals more open to extensions.

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

17 February, 23:39:53

On Sun, Feb 16, 2025 at 5:34 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Feb 15, 2025 at 10:29:41AM +0100, Andrei Lepikhov wrote:
> > I have already implemented it twice in different ways as a core patch.
> > In my projects, we need to track queryId and plan node ID for two reasons:
>
> Are these available in the public somewhere or is that simply what
> Sami is proposing?

Andrei, you mention "plan node ID" which if I understand correctly will be
a good thing to expose to determine which part of the plan most of the time
is spent. This is similar to an idea I raised in a different thread for
explain plan progress [1].

However, I am proposing something different, which is we track a plan_id of the
full execution plan. Some extensions may hash the text version of the plan,
others may choose to do something more elaborate such as "jumbling" a plan
tree. The point becomes is that monitoring extensions such as
pg_stat_monitor and
others can set the plan_id in core so it's available in backend status.

> > 1. Optimisational decisions made during transformation/path generation
> > stages up to the end of execution to correct them in the future.
> > 2. Cache information about the query tree/node state to use it for
> > statistical purposes.
>
> Gathering of statistical data based on a node tree is one reason,
> where it may or may not be required to walk through a path.
> Influencing the plan used with an already-generated one (where hints
> could be used) was the second one, mostly replacing a plan in the
> planner hook.  Influencing the paths in a plan or a subplan didn't
> really matter much with hints to drive the paths.
>
> > In my experience, we don't need a single plan_id field; we just need an
> > 'extended list' pointer at the end of the Plan, PlannedStmt, Query, and
> > RelOptInfo structures and a hook at the end of the create_plan_recurse() to
> > allow passing some info from the path generator to the plan tree.
> > An extension may add its data to the list (we may register an extensible
> > node type to be sure we don't interfere with other extensions) and
> > manipulate it in a custom way and with custom UI.
> > Generally, it makes the optimiser internals more open to extensions.
>
> Sounds to me that this maps with the addition of a "private" area to
> some of the plan structures to allow extensions to attach some data
> that would be reused elsewhere, which is rather independent than
> what's suggested here?

+1, such a private area is different from what is being proposed.


[1] https://www.postgresql.org/message-id/CAA5RZ0uGDKWxqUCMrsWKV425T2f6mqJsXKg6chq%2BWuyCwNPUGw%40mail.gmail.com.

--
Sami

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

21 February, 02:03:19

I put together patches to do as is being proposed.

v1-0001:

1. Adds a planId field in PlannedStmt
2. Added an st_plan_id fields in PgBackendStatus
3. APIs to report and to retrieve a planId to PgBackendStatus

An extension is able to set a planId in PlannedStmt directly,
and while they can do the same for PgBackendStatus, I felt it is better
to provide similar APIs for this as we do for queryId. Unless the extension
passed force=true to the API, this will ensure that a planId already set
either by a top-level statement or by another extension cannot be
set when a plan is active.

Also, the extension does not need to worry about resetting the planId at
the end of execution as this will follow the same mechanism as is currently
being done for queryId. This will remove responsibility from the extension
author to have to manage this aspect.

The extension should normally compute the planId in the planner or ExecutorStart
hook. If the planId is computed in the planner_hook, the extension can set the
planId in plannedStmt making the planId available to subsequent executions of
a cached plan.

What this patch does not cover is adding a "Plan Identifier" to explain output
or to logs. Also, there is no core GUC like compute_query_id, since it does not
make sense if we're not computing a planId in core.

v2-0001:

This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.

Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.

These 2 patches can probably be combined , but will leave them like
this for now.

Looking forward to feedback.

Regards

Sami

Attachment

Re: Proposal - Allow extensions to set a Plan Identifier

From

Ilia Evdokimov

Date:

18 March, 14:46:11

Hi,

Thank you for the patches. They're very important for many extensions.

I've debugged them using simple extensions pg_stat_statements and 
auto_explain, specifically checking cases involving PlannedStmt 
and pg_stat_get_activity - , and haven't encountered any issues so far.

However, I have a question: what value should planid have when we 
execute the standard planner without using a planner_hook? For example, 
if pg_stat_statementsis disabled, would planid default to zero? If zero 
is the expected default, should we explicitly write this behavior?

result->planid = 0;

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Proposal - Allow extensions to set a Plan Identifier

From

Ilia Evdokimov

Date:

18 March, 15:09:19

On 18.03.2025 14:46, Ilia Evdokimov wrote:
>
> However, I have a question: what value should planid have when we 
> execute the standard planner without using a planner_hook? For 
> example, if pg_stat_statementsis disabled, would planid default to 
> zero? If zero is the expected default, should we explicitly write this 
> behavior?
>
> result->planid = 0;
>

Sorry for spam. I found the answer this question in thread [0].

[0]: 
https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

19 March, 08:15:11

On Thu, Feb 20, 2025 at 05:03:19PM -0600, Sami Imseih wrote:
> v2-0001:
>
> This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
> An extension
> can offer its own view, similar to pg_stat_activity, which can include planId.
>
> Note that there are no documentation updates required here as we don't
> have per-field documentation for pg_stat_get_activity.
>
> These 2 patches can probably be combined , but will leave them like
> this for now.

The split makes more sense to me, and FWIW I see more value in 0001
that provides an anchor on the backend side for extensions to plug in
a plan ID that can be tracked across multiple layers.  I am less
convinced about the need for 0002 that adds this information to
pg_stat_activity at this stage, as the getter routine in 0001 is
enough to know what's happening, roughly.

+void
+pgstat_report_plan_id(uint64 plan_id, bool force)
+{
+    volatile PgBackendStatus *beentry = MyBEEntry;
+
+    if (!beentry || !pgstat_track_activities)
+        return;
+
+    if (beentry->st_plan_id != 0 && !force)
+        return;

Right, and that's in line with the cases I've seen because we wanted
to be able to optionally control if only the top-level plan ID should
be stored or not, which is useful when going through the planner hook
at some sub-level.  Hmm.  Shouldn't we document all such expectations,
which are similar to a query ID?  Our experience with pgss in this
area offers some insights, which I guess folks are rather used to and
could be applied to plan IDs.

In short, it would be nice to do 0001 for this release and start from
that (with more documentation around the expectations of these
interfaces).  For 0002 we could always see later about exposing it in
a system view, even if it would offer the advantage of more
consistency across a single call of
pgstat_get_local_beentry_by_index() for each backend entry.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

21 March, 04:46:55

> > This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
> > An extension
> > can offer its own view, similar to pg_stat_activity, which can include planId.
> >
> > Note that there are no documentation updates required here as we don't
> > have per-field documentation for pg_stat_get_activity.
> >
> > These 2 patches can probably be combined , but will leave them like
> > this for now.
>
> The split makes more sense to me, and FWIW I see more value in 0001
> that provides an anchor on the backend side for extensions to plug in
> a plan ID that can be tracked across multiple layers.

Yes, I think this is a step in the right direction for this functionality.

> I am less
> convinced about the need for 0002 that adds this information to
> pg_stat_activity at this stage, as the getter routine in 0001 is
> enough to know what's happening, roughly.

I agree. I don't think exposing this in a system view is required at this
time. Once extensions start to use it, and it becomes obvious it should
be exposed in a system view, that discussion can be had at that
time.

>  Hmm.  Shouldn't we document all such expectations,
> which are similar to a query ID?

I had a lazy comment :)

* For the same reasons as the query identifiers (see above),

but, I went ahead and commented it similar to how we document
pgstat_report_query_id and pgstat_get_my_query_id routines.
attached is v2-0001

--
Sami Imseih
Amazon Web Services (AWS)

Attachment

v2-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patch

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

21 March, 10:33:41

On Thu, Mar 20, 2025 at 09:46:55PM -0400, Sami Imseih wrote:
> * For the same reasons as the query identifiers (see above),
>
> but, I went ahead and commented it similar to how we document
> pgstat_report_query_id and pgstat_get_my_query_id routines.
> attached is v2-0001

Looks mostly OK from here..  Except for two things.

@@ -2016,6 +2017,17 @@ exec_bind_message(StringInfo input_message)
      */
     cplan = GetCachedPlan(psrc, params, NULL, NULL);

+    foreach(lc, cplan->stmt_list)
+    {
+        PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+
+        if (plan->planId != UINT64CONST(0))
+        {
+            pgstat_report_plan_id(plan->planId, false);
+            break;
+        }
+    }

In exec_bind_message(), the comment at the top of PortalDefineQuery()
tells to not put any code between this call and the GetCachedPlan()
that could issue an error.  pgstat_report_plan_id() is OK, but I'd
rather play it safe and set the ID once the portal is defined, based
on portal->stmts instead.  That's the same as your patch, but slightly
safer in the long-term, especially if pgstat_report_plan_id() is
twisted in such a way that it introduces a path where it ERRORs.

planner() is the sole place in the core code where the planner hook
can be called.  Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query?  At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.

bind and execute message paths are OK as they stand, where we set a
plan ID once their portal is defined from its planned statements.

With some adjustments to some comments and the surroundings of the
code, I get the attached.  What do you think?
--
Michael

Attachment

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

21 March, 16:25:24

> In exec_bind_message(), the comment at the top of PortalDefineQuery()
> tells to not put any code between this call and the GetCachedPlan()
> that could issue an error.  pgstat_report_plan_id() is OK, but I'd
> rather play it safe and set the ID once the portal is defined, based
> on portal->stmts instead.  That's the same as your patch, but slightly
> safer in the long-term, especially if pgstat_report_plan_id() is
> twisted in such a way that it introduces a path where it ERRORs.

Yes that makes sense. As long as we report plan_id before
PortalStart, for obvious reasons.

> planner() is the sole place in the core code where the planner hook
> can be called.  Shouldn't we have at least a call to
> pgstat_report_plan_id() after planning a query?  At least that should
> be the behavior I'd expect, where a module pushes a planId to a
> PlannedStmt, then core publishes it to the backend entry in non-force
> mode.

I agree. I was just thinking we rely on the exec_ routines to report the plan_id
at the start. But, besides the good reason you give, reporting
(slightly) earlier is
better for monitoring tools; as it reduces the likelihood they find an empty
plan_id.


Overall, v3 LGTM

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

22 March, 08:48:41

On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:
>> planner() is the sole place in the core code where the planner hook
>> can be called.  Shouldn't we have at least a call to
>> pgstat_report_plan_id() after planning a query?  At least that should
>> be the behavior I'd expect, where a module pushes a planId to a
>> PlannedStmt, then core publishes it to the backend entry in non-force
>> mode.
>
> I agree. I was just thinking we rely on the exec_ routines to report the plan_id
> at the start. But, besides the good reason you give, reporting
> (slightly) earlier is
> better for monitoring tools; as it reduces the likelihood they find an empty
> plan_id.

Yep.  pgstat_report_plan_id() is not something that extensions should
do, but they should report the planId in the PlannedStmt and let the
backend do the rest.

> Overall, v3 LGTM

Thanks.  If anybody has any objections and/or comments, now would be a
good time.  I'll revisit this thread at the beginning of next week.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

23 March, 01:50:06

On 3/22/25 06:48, Michael Paquier wrote:
> On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:
>>> planner() is the sole place in the core code where the planner hook
>>> can be called.  Shouldn't we have at least a call to
>>> pgstat_report_plan_id() after planning a query?  At least that should
>>> be the behavior I'd expect, where a module pushes a planId to a
>>> PlannedStmt, then core publishes it to the backend entry in non-force
>>> mode.
>>
>> I agree. I was just thinking we rely on the exec_ routines to report the plan_id
>> at the start. But, besides the good reason you give, reporting
>> (slightly) earlier is
>> better for monitoring tools; as it reduces the likelihood they find an empty
>> plan_id.
> 
> Yep.  pgstat_report_plan_id() is not something that extensions should
> do, but they should report the planId in the PlannedStmt and let the
> backend do the rest.
> 
>> Overall, v3 LGTM
> 
> Thanks.  If anybody has any objections and/or comments, now would be a
> good time.  I'll revisit this thread at the beginning of next week.
The last time I wrote the email, I mistakenly thought about 
plan_node_id. I apologize for that.
planId actually looks less controversial than queryId or plan_node_id. 
At the same time, it is not free from the different logic that 
extensions may incorporate into this value: I can imagine, for example, 
the attempt of uniquely numbering plans related to the same queryId, 
plain hashing of the plan tree, hashing without constants, etc.
So, it seems that extensions may conflict with the same field. Are we 
sure that will happen if they are loaded in different orders in 
neighbouring backends?
I'm not very good at stat collector guts, but would it be possible to 
allow extensions to report their state in standard tuple format? In that 
case, doubts about queryId or planId may be resolved.

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

23 March, 03:01:14

On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
> planId actually looks less controversial than queryId or plan_node_id. At
> the same time, it is not free from the different logic that extensions may
> incorporate into this value: I can imagine, for example, the attempt of
> uniquely numbering plans related to the same queryId, plain hashing of the
> plan tree, hashing without constants, etc.

One plan ID should have one single query ID, but the opposite may be
not necessarily true: a query ID could be associated with multiple
plan patterns (aka multiple plan IDs).  What this offers is that we
know about which plan one query is currently using in live, for a
given query ID.

> So, it seems that extensions may conflict with the same field. Are we sure
> that will happen if they are loaded in different orders in neighbouring
> backends?

Depends on what kind of computation once wants to do, and surely
without some coordination across the extensions you are using these
cannot be shared, but it's no different from the concept of a query
ID.  The use cases I've seen where this field is useful is when
splitting code that uses the planner hook for several purposes across
more than one extension.  Three of them which are independent, still
want to treat know about a plan ID:
- Set/get an existing node tree plan based on a specific ID.
- Hash calculation of a tree (like Lukas proposal).
- Statistics related to the plan tree used (aka custom cumulative
stats play here).

> I'm not very good at stat collector guts, but would it be possible to allow
> extensions to report their state in standard tuple format? In that case,
> doubts about queryId or planId may be resolved.

I am not exactly sure what you mean here.  We are only going to have
one plan ID set for each query execution.  Any stats plan data related
to a plan ID surely needs to be upper-bounded if you don't want to
bloat pgstats, with the query ID of the query string it relates to
stored in it and the plan ID used as hash key, but it does not change
that we're only going to always have one single plan ID in a
PlannedStmt or in a backend entry doing a query execution, like a
query ID.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

23 March, 06:22:37

> On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
> > planId actually looks less controversial than queryId or plan_node_id. At
> > the same time, it is not free from the different logic that extensions may
> > incorporate into this value: I can imagine, for example, the attempt of
> > uniquely numbering plans related to the same queryId, plain hashing of the
> > plan tree, hashing without constants, etc.

I think plan_node_id is probably the least controversial because that value
comes straight from core, and different extensions cannot have their own
interpretation of what that value could be.

Computing a plan_id is even more open for interpretation than it is
for query_id perhaps, which is why giving this ability to extensions
will be useful. Different extensions may choose to compute a plan_id
different ways, but they all should be able to push this value into core,
and this is what this patch will allow for.

FWIW, Lukas did start a Wiki [0] to open the discussion for what parts
of the plan should be used to compute a plan_id, and maybe we can
in the future compite a plan_id in core by default.

> > So, it seems that extensions may conflict with the same field. Are we sure
> > that will happen if they are loaded in different orders in neighbouring
> > backends?
>
> Depends on what kind of computation once wants to do, and surely
> without some coordination across the extensions you are using these
> cannot be shared, but it's no different from the concept of a query
> ID.

correct. This was also recently raised in the "making Explain extensible" [0]
thread also. That is the nature of extensions, and coordination is required.

[0] https://wiki.postgresql.org/wiki/Plan_ID_Jumbling
[1] https://www.postgresql.org/message-id/CA%2BTgmoZ8sTodL-orXQm51_npNxuDAS86BS5kC8t0LULneShRbg%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

23 March, 10:12:45

On 3/23/25 01:01, Michael Paquier wrote:
> On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
>> planId actually looks less controversial than queryId or plan_node_id. At
>> the same time, it is not free from the different logic that extensions may
>> incorporate into this value: I can imagine, for example, the attempt of
>> uniquely numbering plans related to the same queryId, plain hashing of the
>> plan tree, hashing without constants, etc.
> 
> One plan ID should have one single query ID,
I'm not sure I understand what do you mean by that. QueryId reflects 
syntactic structure of the query, but planId - semantics. For example:

SELECT relname FROM pg_class JOIN pg_am USING (oid);
SELECT relname FROM pg_am JOIN pg_class USING (oid);

have different queryIds: -6493293269785547447 and 4243743071053231833

but the same plan:

  Hash Join
    Hash Cond: (pg_class.oid = pg_am.oid)
    ->  Seq Scan on pg_class
    ->  Hash
          ->  Seq Scan on pg_am

You can invent multiple other cases here - remember query tree 
transformations we made before optimisation.

> but the opposite may be
> not necessarily true: a query ID could be associated with multiple
> plan patterns (aka multiple plan IDs).  What this offers is that we
> know about which plan one query is currently using in live, for a
> given query ID.
Okay, as I've said before, it seems practical. I just worry because I 
see that people don't understand that queryId is a more or less 
arbitrary value that may or may not group queries that differ only by 
constants to a single "Query Class". I think the same attitude will be 
paid to planId. It is okay for statistics. However, non-statistical 
extensions need more guarantees and want to put different logic into 
these values.
For now, we have examples of only statistical extensions in open-source 
and may live with a proposed solution.
> 
>> So, it seems that extensions may conflict with the same field. Are we sure
>> that will happen if they are loaded in different orders in neighbouring
>> backends?
> 
> Depends on what kind of computation once wants to do, and surely
> without some coordination across the extensions you are using these
> cannot be shared, but it's no different from the concept of a query
> ID.
Hmm, queryId generation code lies in the core and we already came to 
terms that this field has only a statistical purpose. Do you want to 
commit planId generation code?

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

23 March, 10:25:42

On 3/23/25 04:22, Sami Imseih wrote:
>> On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
>>> planId actually looks less controversial than queryId or plan_node_id. At
>>> the same time, it is not free from the different logic that extensions may
>>> incorporate into this value: I can imagine, for example, the attempt of
>>> uniquely numbering plans related to the same queryId, plain hashing of the
>>> plan tree, hashing without constants, etc.
> 
> I think plan_node_id is probably the least controversial because that value
> comes straight from core, and different extensions cannot have their own
> interpretation of what that value could be.
I meant the proposal to let extensions calculate this field. Someone may 
want node_id to be unique inside the plan; for someone - it should 
reflect the nature of the node, and it may be the same even inside one 
plan, etc. In this case, it is highly controversial.
> 
> Computing a plan_id is even more open for interpretation than it is
> for query_id perhaps, which is why giving this ability to extensions
> will be useful. Different extensions may choose to compute a plan_id
> different ways, but they all should be able to push this value into core,
> and this is what this patch will allow for.
That's why I worry about allowing extensions to compute it. Remember: 
for now, an extension can't be sure that a conflicting one is not 
already loaded in the backend. The code [1] may relieve this problem.

[1] 
https://www.postgresql.org/message-id/flat/441661.1742683797@sss.pgh.pa.us#7640020deb6497fe049ac4839eaeb33d

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

23 March, 17:56:28

> > but the opposite may be
> > not necessarily true: a query ID could be associated with multiple
> > plan patterns (aka multiple plan IDs).  What this offers is that we
> > know about which plan one query is currently using in live, for a
> > given query ID.
> Okay, as I've said before, it seems practical. I just worry because I
> see that people don't understand that queryId is a more or less
> arbitrary value that may or may not group queries that differ only by
> constants to a single "Query Class". I think the same attitude will be
> paid to planId. It is okay for statistics. However, non-statistical
> extensions need more guarantees and want to put different logic into
> these values.
> For now, we have examples of only statistical extensions in open-source
> and may live with a proposed solution.

statistical/monitoring reasons are an important use case. Detecting plan
changes, load attributed to a specific plan, etc.
However, I also do see other extensions that could implement a plan_id
that has meaning beyond statistical/monitoring purposes.
Plan management/enforcement is another use-case.

> >> So, it seems that extensions may conflict with the same field. Are we sure
> >> that will happen if they are loaded in different orders in neighbouring
> >> backends?
> >
> > Depends on what kind of computation once wants to do, and surely
> > without some coordination across the extensions you are using these
> > cannot be shared, but it's no different from the concept of a query
> > ID.
> Hmm, queryId generation code lies in the core and we already came to
> terms that this field has only a statistical purpose. Do you want to
> commit planId generation code?

But, extensions don't necessarily need to rely on the core queryId. They
can generate their own queryId.
We have it documented for compute_query_id as such [0]
"Note that an external module can alternatively be used if the in-core query
identifier computation method is not acceptable"

[0] https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-COMPUTE-QUERY-ID

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

23 March, 18:30:04

On 3/23/25 15:56, Sami Imseih wrote:
>> Hmm, queryId generation code lies in the core and we already came to
>> terms that this field has only a statistical purpose. Do you want to
>> commit planId generation code?
> 
> But, extensions don't necessarily need to rely on the core queryId. They
> can generate their own queryId.
> We have it documented for compute_query_id as such [0]
> "Note that an external module can alternatively be used if the in-core query
> identifier computation method is not acceptable"
In theory they don't, but in practice they must.
This mess especially boring because one of problematic extensions at the 
same time the most popular one - pg_stat_statements. People forget to 
follow strict instructions about load order of extensions and think it 
is the extension instability when one of their query plans isn't 
captured, but should.
So, it may be not an issue in a cloud provider predefined installations, 
but a headache for custom configurations.

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

24 March, 02:52:33

On Sun, Mar 23, 2025 at 04:30:04PM +0100, Andrei Lepikhov wrote:
> So, it may be not an issue in a cloud provider predefined installations, but
> a headache for custom configurations.

Sure, I mean, but it's not really related to the issue discussed on
this thread, so..

It sounds like we could improve the documentation about the GUCs that
hold a list of library names and load them in the order specified, so
as we could point somewhere rather that just saying "don't do that".
Another thing is extensions that have the idea of not tracking a hook
previously registered, and missing to call it.  Another one is that
the previous hook can be called before a module's look, or after it.
Depending on how an out-of-core extension is maintained, there are a
lot of things that can be done.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

24 March, 07:43:03

On Sat, Mar 22, 2025 at 10:22:37PM -0500, Sami Imseih wrote:
> I think plan_node_id is probably the least controversial because that value
> comes straight from core, and different extensions cannot have their own
> interpretation of what that value could be.

Depends.  An extension can plug in what they want.  The point is that
the key used to identify a single plan is up to what extensions think
is relevant in a plan.  That's heavily subject to interpretation.
What's not really subject to interpretation is that an extension
cannot know it should set and/or use as key identifier without
something that some portion pf the code structures knows about, or
these extensions have an inter-dependency.  Anyway, there are also the
arguments about the set timing, reset timing, the extended protocol
argument, etc.  So I've applied the patch for now, to start with
something.

> FWIW, Lukas did start a Wiki [0] to open the discussion for what parts
> of the plan should be used to compute a plan_id, and maybe we can
> in the future compite a plan_id in core by default.

Let's see where this leads..  I suspect that this is going to take
some time, assuming that we're ever able to settle on a clear
definition.  Perhaps we will, or perhaps we will not.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Lukas Fittl

Date:

24 March, 21:47:40

On Sun, Mar 23, 2025 at 9:43 PM Michael Paquier <michael@paquier.xyz> wrote:

So I've applied the patch for now, to start with
something.

Thanks for committing that, I think that's a great starting point for 18!

Ideally we can also land the jumble funcs work in the other thread to allow extensions to re-use the in-core logic for jumbling expressions found in plan node trees.

> FWIW, Lukas did start a Wiki [0] to open the discussion for what parts
> of the plan should be used to compute a plan_id, and maybe we can
> in the future compite a plan_id in core by default.

Let's see where this leads.. I suspect that this is going to take
some time, assuming that we're ever able to settle on a clear
definition. Perhaps we will, or perhaps we will not.

I think there is a good chance we'll land on a reasonable planid calculation for core (or at least a pg_stat_plans in contrib that does its own tree walk) within the PG19 development cycle - I think plan IDs are actually less complicated than query IDs in terms of what should be considered unique - but maybe that's just my perspective :)

I'll be at PGConf.dev this year, would be great to do an unconference session to discuss this further.

Thanks,

Lukas

Lukas Fittl

Re: Proposal - Allow extensions to set a Plan Identifier

From

Alena Rybakina

Date:

24 March, 22:21:24

Hi! I started reviewing it and noticed that your code repeated this 
cycle maybe it would be better to put it in a separate function, for 
example in the form of a name like "analyze_stmts"?

or is it possible to create a macro for them?

@@ -2016,6 +2017,17 @@ exec_bind_message(StringInfo input_message)
       */
      cplan = GetCachedPlan(psrc, params, NULL, NULL);

+    foreach(lc, cplan->stmt_list)
+    {
+        PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+
+        if (plan->planId != UINT64CONST(0))
+        {
+            pgstat_report_plan_id(plan->planId, false);
+            break;
+        }
+    }
+
      /*
       * Now we can define the portal.
       *
@@ -2170,6 +2182,17 @@ exec_execute_message(const char *portal_name, 
long max_rows)
          }
      }

+    foreach(lc, portal->stmts)
+    {
+        PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+
+        if (stmt->planId != UINT64CONST(0))
+        {
+            pgstat_report_plan_id(stmt->planId, false);
+            break;
+        }
+    }
+
      cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen);

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

24 March, 22:31:17

> Hi! I started reviewing it and noticed that your code repeated this
> cycle maybe it would be better to put it in a separate function, for
> example in the form of a name like "analyze_stmts"?
>
> or is it possible to create a macro for them?

Perhaps it may be better to stick this all in a macro,
but I did it that way because we have similar
pattern for queryId, with some repeating patterns for
looping through a list of Query or PlannedStmt.

As this patch already got committed, feel free to
propose a patch for this improvement.

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

24 March, 22:36:05

> Ideally we can also land the jumble funcs work in the other thread to allow extensions to re-use the
> in-core logic for jumbling expressions found in plan node trees.

IIUC, there should be a refactor I am working on at this moment to make that
possible [0]

>> > FWIW, Lukas did start a Wiki [0] to open the discussion for what parts
>> > of the plan should be used to compute a plan_id, and maybe we can
>> > in the future compite a plan_id in core by default.
>>
>> Let's see where this leads..  I suspect that this is going to take
>> some time, assuming that we're ever able to settle on a clear
>> definition.  Perhaps we will, or perhaps we will not.
>
>
> I think there is a good chance we'll land on a reasonable planid calculation for core

I agree

> I'll be at PGConf.dev this year, would be great to do an unconference session to discuss this further.

That will be a good idea!

[0] https://www.postgresql.org/message-id/Z9khOo14yzZHCnmn%40paquier.xyz

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

25 March, 01:05:30

On Mon, Mar 24, 2025 at 02:36:05PM -0500, Sami Imseih wrote:
>> Ideally we can also land the jumble funcs work in the other thread
>> to allow extensions to re-use the
>> in-core logic for jumbling expressions found in plan node trees.
>
> IIUC, there should be a refactor I am working on at this moment to make that
> possible [0]

That would be nice, thanks.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

25 March, 02:47:59

> On Mon, Mar 24, 2025 at 02:36:05PM -0500, Sami Imseih wrote:
> >> Ideally we can also land the jumble funcs work in the other thread
> >> to allow extensions to re-use the
> >> in-core logic for jumbling expressions found in plan node trees.
> >
> > IIUC, there should be a refactor I am working on at this moment to make that
> > possible [0]
>
> That would be nice, thanks.

As mentioned in [0], I spent some time thinking about making some of the core
jumbling facilities available to plugins, and I have several patches to share.

First, none of the changes change the names of the files from queryjumblefuncs.c
to something more generic, as there was some resistance to that idea in this
thread. I will keep it as is for now, and this can be revisited at a
later time, if need be.

So, v1-0001 provides the following to an extension:

InitializeJumbleState - to initialize a jumble state
JumbleNode - To jumble an expression
HashJumbleState - To produce the 64-bit hash from the jumble state

Also, an Assert was added inside  RecordConstLocation to assert
that the jumble state was initialized with record_clocations.

I know it was mentioned above by both Michael and Andrei that
AppendJumble should not be exposed. I am not sure I agree with
that. I think it should be exposed, along with
JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
and _jumbleList.

I am afraid that if we don't expose these routines/macros, the
extension will have
to re-invent those wheels. This is not included in v1-0001, but If
there is no objection,
I can update the patch. Thoughts?

[0] https://www.postgresql.org/message-id/Z9khOo14yzZHCnmn%40paquier.xyz

Attachment

v1-0001-Allow-plugins-to-Jumble-an-expression.patch

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

25 March, 10:13:28

On Mon, Mar 24, 2025 at 06:47:59PM -0500, Sami Imseih wrote:
> I know it was mentioned above by both Michael and Andrei that
> AppendJumble should not be exposed. I am not sure I agree with
> that. I think it should be exposed, along with
> JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
> and _jumbleList.
>
> I am afraid that if we don't expose these routines/macros, the
> extension will have
> to re-invent those wheels. This is not included in v1-0001, but If
> there is no objection,
> I can update the patch. Thoughts?

Hmm, yes, the macros could prove to be useful to allow extensions to
compile their own code the manipulations they want to do on the node
structures they'd like to touch, but wouldn't that mean that they
would copy in some way a portion of what gen_node_support.pl does?

Perhaps we should think more carefully about this part, and refactor
this script to work as a perl module so as it could be reused by some
external code, at least for the parsing of the headers and the
attributes assigned to each nodes and their attributes?

This is just to drop an idea in the bucket of ideas, trying to think
about the potential larger picture.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Lukas Fittl

Date:

25 March, 10:45:28

On Tue, Mar 25, 2025 at 12:13 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Mar 24, 2025 at 06:47:59PM -0500, Sami Imseih wrote:
> I know it was mentioned above by both Michael and Andrei that
> AppendJumble should not be exposed. I am not sure I agree with
> that. I think it should be exposed, along with
> JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
> and _jumbleList.
>
> I am afraid that if we don't expose these routines/macros, the
> extension will have
> to re-invent those wheels. This is not included in v1-0001, but If
> there is no objection,
> I can update the patch. Thoughts?

Hmm, yes, the macros could prove to be useful to allow extensions to
compile their own code the manipulations they want to do on the node
structures they'd like to touch, but wouldn't that mean that they
would copy in some way a portion of what gen_node_support.pl does?

I agree with Sami that we should expose AppendJumble (it'd be necessary for any extension that wants to re-use the jumbling logic), and I don't see a reason to skip over the convenience macros, but also not hard to recreate those. AFAIK you could get by without having access to _jumbleList, since you can just call JumbleNode which can jumble lists.

The initial patch I had posted over at the other thread [0], (patch 0003 to be precise in the initial set, which was partially based on work Sami had done previously), showed what I think is the minimum we should enable:

case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
...

JUMBLE_VALUE(splan->indexid);

JumbleNode(jstate, (Node *) splan->indexqual);
...

i.e. allow someone to defer to core logic for expressions, but require them to implement their own (manual) way of writing the jumble functions/conditions for the more limited set of expression-containing nodes they care about (like plan nodes).

Of course, thinking about other use cases, I could imagine someone would potentially be interested to change core jumbling logic for only a specific node (e.g. implementing a query ID that considers constant values to be significant), but I'd argue that making that level of customization work is out of scope / hard to do in general.

Perhaps we should think more carefully about this part, and refactor
this script to work as a perl module so as it could be reused by some
external code, at least for the parsing of the headers and the
attributes assigned to each nodes and their attributes?

I've been thinking about how to rework things for a PG18-requiring pg_stat_plans extension I'd like to publish, and if I were to choose a node attribute-based approach I'd probably:

1. Check out the upstream Postgres source for the given version I'm generating jumble code for

2. Modify the headers as needed to have the node attributes I want

3. Call the gen_node_support.pl via the build process

4. Copy out the resulting jumble node funcs/conds

Even with the state currently committed this is already possible, but (4) ends up with a lot of duplicated code in the extension. Assuming we have a way to call jumbleNode + AppendJumble and macros, that step could only keep the jumble conds/funcs that are not defined in core.

So based on that workflow I'm not sure turning gen_node_support.pl into a re-usable module would help, but that's just my perspective without having built this out (yet).

Thanks,

Lukas

[0]: https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com

Lukas Fittl

Re: Proposal - Allow extensions to set a Plan Identifier

From

Andrei Lepikhov

Date:

25 March, 12:42:57

On 3/25/25 00:47, Sami Imseih wrote:
> I know it was mentioned above by both Michael and Andrei that
> AppendJumble should not be exposed. I am not sure I agree with
> that. I think it should be exposed, along with
> JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
> and _jumbleList.
It would be helpful if you could provide an example to support the 
reasoning behind exposing this stuff. I assume you want to influence the 
logic of jumbling, correct? If that’s the case, it might be beneficial 
to add a callback to the JumbleState that is triggered during each node 
jumbling attempt. This could facilitate moving clocations code and data 
into pg_stat_statements and give developers the opportunity to influence 
the jumbling process, adding extra meaning to their hashes.
One reason this might be necessary is that generating the same hash for 
both a Param node and a Const node could lead to a more generalized 
hash, allowing us to group more queries under the same value.

-- 
regards, Andrei Lepikhov

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

26 March, 00:23:15

> On 3/25/25 00:47, Sami Imseih wrote:
> > I know it was mentioned above by both Michael and Andrei that
> > AppendJumble should not be exposed. I am not sure I agree with
> > that. I think it should be exposed, along with
> > JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING
> > and _jumbleList.
> It would be helpful if you could provide an example to support the
> reasoning behind exposing this stuff. I assume you want to influence the
> logic of jumbling, correct?

I was actually thinking more about the case which Lukas mentions above [0],
which is jumbling a Plan. So, an extension having access to AppendJumble (
and the macros ) will allow it to create a routine similar to
JumbleNode but that
can deal with the plannodes.

Of course, the same pattern can be applied to other types of nodes.

> Hmm, yes, the macros could prove to be useful to allow extensions to
> compile their own code the manipulations they want to do on the node
> structures they'd like to touch, but wouldn't that mean that they
> would copy in some way a portion of what gen_node_support.pl does?

I feel like gen_node_support.pl should just deal with generating functions
for core jumbling logic, and extensions should be able to build out their
jumbling funcs, and do it simply if they have access to AppendJumble and
friends.

> So based on that workflow I'm not sure turning gen_node_support.pl into a re-usable module
> would help, but that's just my perspective without having built this out (yet).

> 1. Check out the upstream Postgres source for the given version I'm generating jumble code for
> 2. Modify the headers as needed to have the node attributes I want
> 3. Call the gen_node_support.pl via the build process
> 4. Copy out the resulting jumble node funcs/conds

I like this approach, and the artifacts from #4 will be packed with
the extension.

[0] https://www.postgresql.org/message-id/CAP53Pkw7HD%2B3sssn5UcASWLn%2Ba9oRJOM9hXteDCg64JJ662bHQ%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

26 March, 05:43:13

On Tue, Mar 25, 2025 at 04:23:15PM -0500, Sami Imseih wrote:
> On 3/25/25 00:47, Sami Imseih wrote:
>> 1. Check out the upstream Postgres source for the given version I'm generating jumble code for
>> 2. Modify the headers as needed to have the node attributes I want
>> 3. Call the gen_node_support.pl via the build process
>> 4. Copy out the resulting jumble node funcs/conds
>
> I like this approach, and the artifacts from #4 will be packed with
> the extension.

So this comes down to forking the Postgres code to do the job.  What I
had in mind was a slightly different flow, where we would be able to
push custom node attributes between the header parsing and the
generation of the extension code.  If we do that, there would be no
need to fork the Postgres code: extensions could force the definitions
they want to the nodes they want to handle, as long as these do not
need to touch the in-core query jumble logic, of course.

Perhaps we should give more time and thoughts to the concepts we want
to expose at this level of the code for extensions.  Hence I would
side with not rushing things, and consider our options more carefully
for v19 with what we would consider to be better design.
--
Michael

Attachment

signature.asc

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

26 March, 06:12:08

So this comes down to forking the Postgres code to do the job. What I
had in mind was a slightly different flow, where we would be able to
push custom node attributes between the header parsing and the
generation of the extension code. If we do that, there would be no
need to fork the Postgres code: extensions could force the definitions
they want to the nodes they want to handle, as long as these do not
need to touch the in-core query jumble logic, of course.

Allowing extensions to generate code for custom node attributes could be

taken up in 19. What about for 18 we think about exposing AppendJumble,

JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING?

—

Sami Imseih

Amazon Web Services (AWs)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Lukas Fittl

Date:

31 March, 21:27:25

On Tue, Mar 25, 2025 at 8:12 PM Sami Imseih <samimseih@gmail.com> wrote:

So this comes down to forking the Postgres code to do the job. What I
had in mind was a slightly different flow, where we would be able to
push custom node attributes between the header parsing and the
generation of the extension code. If we do that, there would be no
need to fork the Postgres code: extensions could force the definitions
they want to the nodes they want to handle, as long as these do not
need to touch the in-core query jumble logic, of course.

Allowing extensions to generate code for custom node attributes could be
taken up in 19. What about for 18 we think about exposing AppendJumble,
JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING?

+1, I think exposing the node jumbling logic + ability to jumble values for extensions would make a big difference to get into 18, specifically for allowing extensions to do plan ID calculations efficiently.

Attached a rebased version that accounts for the changes in f31aad9b0 and ad9a23bc4, and also makes AppendJumble* available, as well as two helper defines JUMBLE_VALUE and JUMBLE_VALUE_STRING. These are intended to work on values, not nodes, because I think that requiring the caller to have a local "expr" variable doesn't seem like a good API.

I've also intentionally reduced the API surface area and not allowed initializing a jumble state that records constant locations / not exported RecordConstLocations. I think the utility of that seems less clear (possibly out-of-core queryid customization with a future extension hook in jumbleNode) and requires more refinement.

Thoughts?

Thanks,

Lukas

Lukas Fittl

Attachment

v2-0001-Allow-plugins-to-Jumble-an-expression.patch

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

01 April, 18:58:30

On Mon, Mar 31, 2025 at 1:28 PM Lukas Fittl <lukas@fittl.com> wrote:
>
> On Tue, Mar 25, 2025 at 8:12 PM Sami Imseih <samimseih@gmail.com> wrote:
>>>
>>> So this comes down to forking the Postgres code to do the job.  What I
>>> had in mind was a slightly different flow, where we would be able to
>>> push custom node attributes between the header parsing and the
>>> generation of the extension code.  If we do that, there would be no
>>> need to fork the Postgres code: extensions could force the definitions
>>> they want to the nodes they want to handle, as long as these do not
>>> need to touch the in-core query jumble logic, of course.
>>
>>
>> Allowing extensions to generate code for custom node attributes could be
>> taken up in 19. What about for 18 we think about exposing AppendJumble,
>>  JUMBLE_FIELD, JUMBLE_FIELD_SINGLE and JUMBLE_STRING?
>
>
> +1, I think exposing the node jumbling logic + ability to jumble values for extensions would make a big difference to
getinto 18, specifically for allowing extensions to do plan ID calculations efficiently 

I think getting these APIs into 18 is super important, especially with
the optimizations made in
ad9a23bc4; I will reiterate that extension developers should not have
to re-invent these
wheels :)

> I've also intentionally reduced the API surface area and not allowed initializing a jumble state that records
constantlocations / not exported RecordConstLocations. 
> I think the utility of that seems less clear (possibly out-of-core queryid customization with a future extension hook
injumbleNode) and requires more refinement. 

I agree with that primarily because I don't know of the use case at
this time in which
an extension will need to record a constant location.

I reviewed the patch and I only have one comment. I still think
we need an Assert inside RecordConstantLocation to make sure clocations
is allocated if the caller intends to record constant locations.

RecordConstLocation(JumbleState *jstate, int location, bool squashed)
{
+ Assert(jstate->clocations);
+

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

02 April, 22:43:13

> I reviewed the patch and I only have one comment. I still think
> we need an Assert inside RecordConstantLocation to make sure clocations
> is allocated if the caller intends to record constant locations.
>
> RecordConstLocation(JumbleState *jstate, int location, bool squashed)
> {
> + Assert(jstate->clocations);
> +

Here is v3 with the Assert in place


--
Sami Imseih
Amazon Web Services (AWS)

Attachment

v3-0001-Allow-plugins-to-Jumble-an-expression.patch

Re: Proposal - Allow extensions to set a Plan Identifier

From

Lukas Fittl

Date:

08 April, 04:11:06

Whilst rebasing the pg_stat_plans work on top of this, I realized that we should probably make this a conditional instead of an assert - if you are jumbling a tree that contains expressions you'd want to be able to jumble a node that has a location (but don't record it). Attached v4 that modifies it like that.

However, since we're basically at feature freeze (and it seems unlikely this gets into 18), I have a quick alternate proposal:

What if, for 18, we just make DoJumble exported to be used by plugins?

DoJumble was added in David's fix for NULL handling in f31aad9b0, but remained local to queryjumblefuncs.c. Exporting that would enable an extension to jumble expression fields contained in plan nodes and get the hash value, and then do its own hashing/jumbling of the plan nodes as it sees fit, without reinventing the wheel. I'll be around the next hours to make a quick patch (though its basically just two lines) if this is of interest.

Thanks,

Lukas

On Wed, Apr 2, 2025 at 12:43 PM Sami Imseih <samimseih@gmail.com> wrote:

> I reviewed the patch and I only have one comment. I still think
> we need an Assert inside RecordConstantLocation to make sure clocations
> is allocated if the caller intends to record constant locations.
>
> RecordConstLocation(JumbleState *jstate, int location, bool squashed)
> {
> + Assert(jstate->clocations);
> +

Here is v3 with the Assert in place

--
Sami Imseih
Amazon Web Services (AWS)

Lukas Fittl

Attachment

v4-0001-Allow-plugins-to-Jumble-an-expression.patch

Re: Proposal - Allow extensions to set a Plan Identifier

From

Sami Imseih

Date:

08 April, 04:58:41

> What if, for 18, we just make DoJumble exported to be used by plugins?

We will need to export InitJumble as well, but DoJumble calls
_jumbleNode which only includes queryjumblefuncs.switch.c,
so I don't think that will be enough for Plan.

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal - Allow extensions to set a Plan Identifier

From

Michael Paquier

Date:

08 April, 07:32:36

On Mon, Apr 07, 2025 at 06:11:06PM -0700, Lukas Fittl wrote:
> However, since we're basically at feature freeze (and it seems unlikely
> this gets into 18), I have a quick alternate proposal:
>
> What if, for 18, we just make DoJumble exported to be used by plugins?

Yes, I am not planning to rush that into v18.  Things are what they
are for this release, I am afraid.  While looking at your patch, I'm
not liking much the fact that this exposes AppendJumbleNN() for
extensions, which should be something that remains internal to
queryjumblefuncs.c.  I don't think that we should do that.

> DoJumble was added in David's fix for NULL handling in f31aad9b0, but
> remained local to queryjumblefuncs.c. Exporting that would enable an
> extension to jumble expression fields contained in plan nodes and get the
> hash value, and then do its own hashing/jumbling of the plan nodes as it
> sees fit, without reinventing the wheel. I'll be around the next hours to
> make a quick patch (though its basically just two lines) if this is of
> interest.

I think that David is getting it right with its initial set of
InitJumble() and DoJumble() interfaces, where it is possible to do a
computation based on a Node.  It is local to queryjumblefuncs.c
because we don't need it elsewhere now, but the question is really how
much would be relevant enough for extension looking at potential Node
computations.  To me, it's pretty much a sign that we should think
more carefully about the interface to provide here, and not rush
things.
--
Michael

Attachment

signature.asc