Thread: Proposal - Allow extensions to set a Plan Identifier

Proposal - Allow extensions to set a Plan Identifier

From
Sami Imseih
Date:
Hi,

A patch by Lukas Fittl [1] introduces the pg_stat_plans extension, which
exposes execution plans along with execution statistics. As part of this work,
a new infrastructure is being proposed to compute a plan identifier (planId),
which is exposed in both pg_stat_plans and pg_stat_activity.

Exposing planId in pg_stat_activity enables users to identify long-running
or high-load plans and correlate them with plan text from the extension.

One of open question in [1] is which components of a plan tree should be
considered when computing planId. Lukas has created a wiki [2] for discussion,
but reaching a consensus may take time—possibly in time for v19, but
it's uncertain.

Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and
store the plan text, but they lack a way to expose planId in pg_stat_activity.
This limits their usefulness, as identifying top or long-running plans from
pg_stat_activity is critical for monitoring.

To address this, I propose leveraging work from [1] and allow extensions
to set a planId This could be implemented sooner than an in-core
computed planId.

Proposal Overview:

1/ Add a planId field to PgBackendStatus.
2/ Add a planId field to PlannedStmt.
3/ Create APIs for extensions to set the current planId.

Storing planId in PlannedStmt ensures it is available with
cached plans.

One consideration is whether reserving a 64-bit integer in PgBackendStatus
and PlannedStmt is acceptable, given that planId may not always be used.
However, this is already the case for queryId when compute_query_id is
disabled and no extension sets it, so it may not be a concern.

Looking forward to feedback on this approach.
If there is agreement, I will work on preparing the patches.

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAP53PkxocbNr%2BeRag3FEJp3-7S1U80FspOg8UQjO902TWMG%3D6A%40mail.gmail.com
[3] https://github.com/percona/pg_stat_monitor



Re: Proposal - Allow extensions to set a Plan Identifier

From
Bertrand Drouvot
Date:
Hi,

On Wed, Feb 12, 2025 at 05:44:20PM -0600, Sami Imseih wrote:
> Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and
> store the plan text, but they lack a way to expose planId in pg_stat_activity.
> This limits their usefulness, as identifying top or long-running plans from
> pg_stat_activity is critical for monitoring.

Agree.

> To address this, I propose leveraging work from [1] and allow extensions
> to set a planId This could be implemented sooner than an in-core
> computed planId.

I think that makes sense and then the idea would be later on to move to something
like 5fd9dfa5f50, but for the "planId": is my understanding correct?

> Proposal Overview:
> 
> 1/ Add a planId field to PgBackendStatus.
> 2/ Add a planId field to PlannedStmt.
> 3/ Create APIs for extensions to set the current planId.

Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
through some hooks). But did not look close enough and would probably depends
of the implementation, so let's see.

> Storing planId in PlannedStmt ensures it is available with
> cached plans.
> 
> One consideration is whether reserving a 64-bit integer in PgBackendStatus
> and PlannedStmt is acceptable, given that planId may not always be used.

From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
is 152 bytes. It looks like that adding an uint64 at the end of those structs
would not add extra padding (that's what "ptype /o struct <struct>" tells me)
so would increase to 440 bytes and 160 bytes respectively. I don't think that's
an issue.

> However, this is already the case for queryId when compute_query_id is
> disabled and no extension sets it

Yup.

> Looking forward to feedback on this approach.
> If there is agreement, I will work on preparing the patches.

That sounds like a plan for me but better to wait for others to share their
thoughts too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal - Allow extensions to set a Plan Identifier

From
Sami Imseih
Date:
Thanks for the feedback!

> I think that makes sense and then the idea would be later on to move to something
> like 5fd9dfa5f50, but for the "planId": is my understanding correct?

correct. This is adding infrastructure to eventually have an in-core
planId; but in the
meanwhile give extensions that already compute a planId the ability to
broadcast the planId in pg_stat_activity.

> > Proposal Overview:
> >
> > 1/ Add a planId field to PgBackendStatus.
> > 2/ Add a planId field to PlannedStmt.
> > 3/ Create APIs for extensions to set the current planId.
>
> Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also
> through some hooks). But did not look close enough and would probably depends
> of the implementation, so let's see.

I don't think direct setting of values is a good idea. We will need an API
similar to pgstat_report_query_id which ensures we are only reporting top
level planIds -and- in the case of multiple extensions with the capability
to set a planId, only the first one in the stack wins. pgstat_report_query_id
does allow for forcing a queryId ( force flag which is false by default ), and
I think this new API should allow the same.

> > Storing planId in PlannedStmt ensures it is available with
> > cached plans.
> >
> > One consideration is whether reserving a 64-bit integer in PgBackendStatus
> > and PlannedStmt is acceptable, given that planId may not always be used.
>
> From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt
> is 152 bytes. It looks like that adding an uint64 at the end of those structs
> would not add extra padding (that's what "ptype /o struct <struct>" tells me)
> so would increase to 440 bytes and 160 bytes respectively. I don't think that's
> an issue.

Correct, thanks for adding this detail.

--

Sami



Re: Proposal - Allow extensions to set a Plan Identifier

From
Andrei Lepikhov
Date:
On 14/2/2025 08:21, Michael Paquier wrote:
> On Thu, Feb 13, 2025 at 11:10:27AM -0600, Sami Imseih wrote:
>> I don't think direct setting of values is a good idea. We will need an API
>> similar to pgstat_report_query_id which ensures we are only reporting top
>> level planIds -and- in the case of multiple extensions with the capability
>> to set a planId, only the first one in the stack wins. pgstat_report_query_id
>> does allow for forcing a queryId ( force flag which is false by default ), and
>> I think this new API should allow the same.
> I'm obviously siding with this proposal because we have an ask to
> track this kind of data, and because this kind of data is kind of hard
> to track across a stack of extensions through the core backend code.
> 
> Point is, would others be interested in this addition or just object
> to it because it touches the core code?
I have already implemented it twice in different ways as a core patch.
In my projects, we need to track queryId and plan node ID for two reasons:
1. Optimisational decisions made during transformation/path generation 
stages up to the end of execution to correct them in the future.
2. Cache information about the query tree/node state to use it for 
statistical purposes.
In my experience, we don't need a single plan_id field; we just need an 
'extended list' pointer at the end of the Plan, PlannedStmt, Query, and 
RelOptInfo structures and a hook at the end of the create_plan_recurse() 
to allow passing some info from the path generator to the plan tree.
An extension may add its data to the list (we may register an extensible 
node type to be sure we don't interfere with other extensions) and 
manipulate it in a custom way and with custom UI.
Generally, it makes the optimiser internals more open to extensions.

-- 
regards, Andrei Lepikhov



Re: Proposal - Allow extensions to set a Plan Identifier

From
Sami Imseih
Date:
On Sun, Feb 16, 2025 at 5:34 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Feb 15, 2025 at 10:29:41AM +0100, Andrei Lepikhov wrote:
> > I have already implemented it twice in different ways as a core patch.
> > In my projects, we need to track queryId and plan node ID for two reasons:
>
> Are these available in the public somewhere or is that simply what
> Sami is proposing?

Andrei, you mention "plan node ID" which if I understand correctly will be
a good thing to expose to determine which part of the plan most of the time
is spent. This is similar to an idea I raised in a different thread for
explain plan progress [1].

However, I am proposing something different, which is we track a plan_id of the
full execution plan. Some extensions may hash the text version of the plan,
others may choose to do something more elaborate such as "jumbling" a plan
tree. The point becomes is that monitoring extensions such as
pg_stat_monitor and
others can set the plan_id in core so it's available in backend status.

> > 1. Optimisational decisions made during transformation/path generation
> > stages up to the end of execution to correct them in the future.
> > 2. Cache information about the query tree/node state to use it for
> > statistical purposes.
>
> Gathering of statistical data based on a node tree is one reason,
> where it may or may not be required to walk through a path.
> Influencing the plan used with an already-generated one (where hints
> could be used) was the second one, mostly replacing a plan in the
> planner hook.  Influencing the paths in a plan or a subplan didn't
> really matter much with hints to drive the paths.
>
> > In my experience, we don't need a single plan_id field; we just need an
> > 'extended list' pointer at the end of the Plan, PlannedStmt, Query, and
> > RelOptInfo structures and a hook at the end of the create_plan_recurse() to
> > allow passing some info from the path generator to the plan tree.
> > An extension may add its data to the list (we may register an extensible
> > node type to be sure we don't interfere with other extensions) and
> > manipulate it in a custom way and with custom UI.
> > Generally, it makes the optimiser internals more open to extensions.
>
> Sounds to me that this maps with the addition of a "private" area to
> some of the plan structures to allow extensions to attach some data
> that would be reused elsewhere, which is rather independent than
> what's suggested here?

+1, such a private area is different from what is being proposed.


[1] https://www.postgresql.org/message-id/CAA5RZ0uGDKWxqUCMrsWKV425T2f6mqJsXKg6chq%2BWuyCwNPUGw%40mail.gmail.com.

--
Sami



Re: Proposal - Allow extensions to set a Plan Identifier

From
Sami Imseih
Date:
I put together patches to do as is being proposed.

v1-0001:

1. Adds a planId field in PlannedStmt
2. Added an st_plan_id fields in PgBackendStatus
3. APIs to report and to retrieve a planId to PgBackendStatus

An extension is able to set a planId in PlannedStmt directly,
and while they can do the same for PgBackendStatus, I felt it is better
to provide similar APIs for this as we do for queryId. Unless the extension
passed force=true to the API, this will ensure that a planId already set
either by a top-level statement or by another extension cannot be
set when a plan is active.

Also, the extension does not need to worry about resetting the planId at
the end of execution as this will follow the same mechanism as is currently
being done for queryId. This will remove responsibility from the extension
author to have to manage this aspect.

The extension should normally compute the planId in the planner or ExecutorStart
hook. If the planId is computed in the planner_hook, the extension can set the
planId in plannedStmt making the planId available to subsequent executions of
a cached plan.

What this patch does not cover is adding a "Plan Identifier" to explain output
or to logs. Also, there is no core GUC like compute_query_id, since it does not
make sense if we're not computing a planId in core.

v2-0001:

This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.

Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.

These 2 patches can probably be combined , but will leave them like
this for now.

Looking forward to feedback.

Regards

Sami

Attachment