Thread: Proposal - Allow extensions to set a Plan Identifier
Hi, A patch by Lukas Fittl [1] introduces the pg_stat_plans extension, which exposes execution plans along with execution statistics. As part of this work, a new infrastructure is being proposed to compute a plan identifier (planId), which is exposed in both pg_stat_plans and pg_stat_activity. Exposing planId in pg_stat_activity enables users to identify long-running or high-load plans and correlate them with plan text from the extension. One of open question in [1] is which components of a plan tree should be considered when computing planId. Lukas has created a wiki [2] for discussion, but reaching a consensus may take time—possibly in time for v19, but it's uncertain. Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and store the plan text, but they lack a way to expose planId in pg_stat_activity. This limits their usefulness, as identifying top or long-running plans from pg_stat_activity is critical for monitoring. To address this, I propose leveraging work from [1] and allow extensions to set a planId This could be implemented sooner than an in-core computed planId. Proposal Overview: 1/ Add a planId field to PgBackendStatus. 2/ Add a planId field to PlannedStmt. 3/ Create APIs for extensions to set the current planId. Storing planId in PlannedStmt ensures it is available with cached plans. One consideration is whether reserving a 64-bit integer in PgBackendStatus and PlannedStmt is acceptable, given that planId may not always be used. However, this is already the case for queryId when compute_query_id is disabled and no extension sets it, so it may not be a concern. Looking forward to feedback on this approach. If there is agreement, I will work on preparing the patches. Regards, Sami Imseih Amazon Web Services (AWS) [1] https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAP53PkxocbNr%2BeRag3FEJp3-7S1U80FspOg8UQjO902TWMG%3D6A%40mail.gmail.com [3] https://github.com/percona/pg_stat_monitor
Hi, On Wed, Feb 12, 2025 at 05:44:20PM -0600, Sami Imseih wrote: > Meanwhile, existing extensions like pg_stat_monitor [3] compute a planId and > store the plan text, but they lack a way to expose planId in pg_stat_activity. > This limits their usefulness, as identifying top or long-running plans from > pg_stat_activity is critical for monitoring. Agree. > To address this, I propose leveraging work from [1] and allow extensions > to set a planId This could be implemented sooner than an in-core > computed planId. I think that makes sense and then the idea would be later on to move to something like 5fd9dfa5f50, but for the "planId": is my understanding correct? > Proposal Overview: > > 1/ Add a planId field to PgBackendStatus. > 2/ Add a planId field to PlannedStmt. > 3/ Create APIs for extensions to set the current planId. Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also through some hooks). But did not look close enough and would probably depends of the implementation, so let's see. > Storing planId in PlannedStmt ensures it is available with > cached plans. > > One consideration is whether reserving a 64-bit integer in PgBackendStatus > and PlannedStmt is acceptable, given that planId may not always be used. From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt is 152 bytes. It looks like that adding an uint64 at the end of those structs would not add extra padding (that's what "ptype /o struct <struct>" tells me) so would increase to 440 bytes and 160 bytes respectively. I don't think that's an issue. > However, this is already the case for queryId when compute_query_id is > disabled and no extension sets it Yup. > Looking forward to feedback on this approach. > If there is agreement, I will work on preparing the patches. That sounds like a plan for me but better to wait for others to share their thoughts too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Thanks for the feedback! > I think that makes sense and then the idea would be later on to move to something > like 5fd9dfa5f50, but for the "planId": is my understanding correct? correct. This is adding infrastructure to eventually have an in-core planId; but in the meanwhile give extensions that already compute a planId the ability to broadcast the planId in pg_stat_activity. > > Proposal Overview: > > > > 1/ Add a planId field to PgBackendStatus. > > 2/ Add a planId field to PlannedStmt. > > 3/ Create APIs for extensions to set the current planId. > > Note sure 3 is needed (MyBEEntry is available publicly and PlannedStmt also > through some hooks). But did not look close enough and would probably depends > of the implementation, so let's see. I don't think direct setting of values is a good idea. We will need an API similar to pgstat_report_query_id which ensures we are only reporting top level planIds -and- in the case of multiple extensions with the capability to set a planId, only the first one in the stack wins. pgstat_report_query_id does allow for forcing a queryId ( force flag which is false by default ), and I think this new API should allow the same. > > Storing planId in PlannedStmt ensures it is available with > > cached plans. > > > > One consideration is whether reserving a 64-bit integer in PgBackendStatus > > and PlannedStmt is acceptable, given that planId may not always be used. > > From what I can see PgBackendStatus is currently 432 bytes and PlannedStmt > is 152 bytes. It looks like that adding an uint64 at the end of those structs > would not add extra padding (that's what "ptype /o struct <struct>" tells me) > so would increase to 440 bytes and 160 bytes respectively. I don't think that's > an issue. Correct, thanks for adding this detail. -- Sami
On 14/2/2025 08:21, Michael Paquier wrote: > On Thu, Feb 13, 2025 at 11:10:27AM -0600, Sami Imseih wrote: >> I don't think direct setting of values is a good idea. We will need an API >> similar to pgstat_report_query_id which ensures we are only reporting top >> level planIds -and- in the case of multiple extensions with the capability >> to set a planId, only the first one in the stack wins. pgstat_report_query_id >> does allow for forcing a queryId ( force flag which is false by default ), and >> I think this new API should allow the same. > I'm obviously siding with this proposal because we have an ask to > track this kind of data, and because this kind of data is kind of hard > to track across a stack of extensions through the core backend code. > > Point is, would others be interested in this addition or just object > to it because it touches the core code? I have already implemented it twice in different ways as a core patch. In my projects, we need to track queryId and plan node ID for two reasons: 1. Optimisational decisions made during transformation/path generation stages up to the end of execution to correct them in the future. 2. Cache information about the query tree/node state to use it for statistical purposes. In my experience, we don't need a single plan_id field; we just need an 'extended list' pointer at the end of the Plan, PlannedStmt, Query, and RelOptInfo structures and a hook at the end of the create_plan_recurse() to allow passing some info from the path generator to the plan tree. An extension may add its data to the list (we may register an extensible node type to be sure we don't interfere with other extensions) and manipulate it in a custom way and with custom UI. Generally, it makes the optimiser internals more open to extensions. -- regards, Andrei Lepikhov
On Sun, Feb 16, 2025 at 5:34 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Feb 15, 2025 at 10:29:41AM +0100, Andrei Lepikhov wrote: > > I have already implemented it twice in different ways as a core patch. > > In my projects, we need to track queryId and plan node ID for two reasons: > > Are these available in the public somewhere or is that simply what > Sami is proposing? Andrei, you mention "plan node ID" which if I understand correctly will be a good thing to expose to determine which part of the plan most of the time is spent. This is similar to an idea I raised in a different thread for explain plan progress [1]. However, I am proposing something different, which is we track a plan_id of the full execution plan. Some extensions may hash the text version of the plan, others may choose to do something more elaborate such as "jumbling" a plan tree. The point becomes is that monitoring extensions such as pg_stat_monitor and others can set the plan_id in core so it's available in backend status. > > 1. Optimisational decisions made during transformation/path generation > > stages up to the end of execution to correct them in the future. > > 2. Cache information about the query tree/node state to use it for > > statistical purposes. > > Gathering of statistical data based on a node tree is one reason, > where it may or may not be required to walk through a path. > Influencing the plan used with an already-generated one (where hints > could be used) was the second one, mostly replacing a plan in the > planner hook. Influencing the paths in a plan or a subplan didn't > really matter much with hints to drive the paths. > > > In my experience, we don't need a single plan_id field; we just need an > > 'extended list' pointer at the end of the Plan, PlannedStmt, Query, and > > RelOptInfo structures and a hook at the end of the create_plan_recurse() to > > allow passing some info from the path generator to the plan tree. > > An extension may add its data to the list (we may register an extensible > > node type to be sure we don't interfere with other extensions) and > > manipulate it in a custom way and with custom UI. > > Generally, it makes the optimiser internals more open to extensions. > > Sounds to me that this maps with the addition of a "private" area to > some of the plan structures to allow extensions to attach some data > that would be reused elsewhere, which is rather independent than > what's suggested here? +1, such a private area is different from what is being proposed. [1] https://www.postgresql.org/message-id/CAA5RZ0uGDKWxqUCMrsWKV425T2f6mqJsXKg6chq%2BWuyCwNPUGw%40mail.gmail.com. -- Sami
I put together patches to do as is being proposed. v1-0001: 1. Adds a planId field in PlannedStmt 2. Added an st_plan_id fields in PgBackendStatus 3. APIs to report and to retrieve a planId to PgBackendStatus An extension is able to set a planId in PlannedStmt directly, and while they can do the same for PgBackendStatus, I felt it is better to provide similar APIs for this as we do for queryId. Unless the extension passed force=true to the API, this will ensure that a planId already set either by a top-level statement or by another extension cannot be set when a plan is active. Also, the extension does not need to worry about resetting the planId at the end of execution as this will follow the same mechanism as is currently being done for queryId. This will remove responsibility from the extension author to have to manage this aspect. The extension should normally compute the planId in the planner or ExecutorStart hook. If the planId is computed in the planner_hook, the extension can set the planId in plannedStmt making the planId available to subsequent executions of a cached plan. What this patch does not cover is adding a "Plan Identifier" to explain output or to logs. Also, there is no core GUC like compute_query_id, since it does not make sense if we're not computing a planId in core. v2-0001: This adds a planId to pg_stat_get_activity ( not pg_stat_activity ). An extension can offer its own view, similar to pg_stat_activity, which can include planId. Note that there are no documentation updates required here as we don't have per-field documentation for pg_stat_get_activity. These 2 patches can probably be combined , but will leave them like this for now. Looking forward to feedback. Regards Sami