Thread: Does Postgres have consistent identifiers (plan hash value) for explain plans?

Does Postgres have consistent identifiers (plan hash value) for explain plans?

From
Jerry Brenner
Date:
We are currently on Postgres 13.9 (and will be moving to later releases).  We are capturing json explain plans and storing them in a database table.  We can tell that there are different plans for some queries, but that's a very labor intensive process - we'd rather do this using SQL and comparing consistent hash values for the plans.  Both Oracle and SQL Server have consistent hash values for query plans and that makes it easy to identify when there are multiple plans for the same query.  Does that concept exist in later releases of Postgres (and is the value stored in the json explain plan)? 

While  we have a pretty good idea of how to manually generate a consistent value, we don't want to reinvent the wheel. Is anyone aware of an existing solution that can be called from SQL/jsonb?

Thanks,
Jerry
Jerry Brenner <jbrenner@guidewire.com> writes:
> Both Oracle and SQL Server have
> consistent hash values for query plans and that makes it easy to identify
> when there are multiple plans for the same query.  Does that concept exist
> in later releases of Postgres (and is the value stored in the json explain
> plan)?

No, there's no support currently for obtaining a hash value that's
associated with a plan rather than an input query tree.

            regards, tom lane



Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

From
Julien Rouhaud
Date:
Hi,

On Mon, Dec 04, 2023 at 06:45:39AM -0800, Jerry Brenner wrote:
> We are currently on Postgres 13.9 (and will be moving to later releases).
> We are capturing json explain plans and storing them in a database table.
> We can tell that there are different plans for some queries, but that's a
> very labor intensive process - we'd rather do this using SQL and comparing
> consistent hash values for the plans.  Both Oracle and SQL Server have
> consistent hash values for query plans and that makes it easy to identify
> when there are multiple plans for the same query.  Does that concept exist
> in later releases of Postgres (and is the value stored in the json explain
> plan)?
>
> While  we have a pretty good idea of how to manually generate a consistent
> value, we don't want to reinvent the wheel. Is anyone aware of an existing
> solution that can be called from SQL/jsonb?

You can look at pg_store_plans extension:
https://github.com/ossc-db/pg_store_plans, it can generate a query plan hash
and also keeps tracks of the (normalized) plans associated to each (normalized)
query.



Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

From
Michael Paquier
Date:
On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
> Jerry Brenner <jbrenner@guidewire.com> writes:
>> Both Oracle and SQL Server have
>> consistent hash values for query plans and that makes it easy to identify
>> when there are multiple plans for the same query.  Does that concept exist
>> in later releases of Postgres (and is the value stored in the json explain
>> plan)?
>
> No, there's no support currently for obtaining a hash value that's
> associated with a plan rather than an input query tree.

PlannerGlobal includes a no_query_jumble that gets inherited by all
its lower-level nodes, so adding support for hashes compiled from
these node structures would not be that complicated.  My point is that
the basic infrastructure is in place in the tree to be able to do
that, and it should not be a problem to even publish the compiled
hashes in EXPLAIN outputs, behind an option of course.
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
>> Jerry Brenner <jbrenner@guidewire.com> writes:
>>> Both Oracle and SQL Server have
>>> consistent hash values for query plans and that makes it easy to identify
>>> when there are multiple plans for the same query.  Does that concept exist
>>> in later releases of Postgres (and is the value stored in the json explain
>>> plan)?

>> No, there's no support currently for obtaining a hash value that's
>> associated with a plan rather than an input query tree.

> PlannerGlobal includes a no_query_jumble that gets inherited by all
> its lower-level nodes, so adding support for hashes compiled from
> these node structures would not be that complicated.  My point is that
> the basic infrastructure is in place in the tree to be able to do
> that, and it should not be a problem to even publish the compiled
> hashes in EXPLAIN outputs, behind an option of course.

Well, yeah, we could fairly easily activate that infrastructure for
plans, but we haven't.  More to the point, it's not clear to me that
that would satisfy the OP's request for "consistent" hash values.
The hashes would vary depending on object OID values, system version,
possibly endianness, etc.

I'm also wondering exactly what the OP thinks qualifies as different
plans.  Remembering the amount of fooling-around that's gone on with
querytree hashes to satisfy various people's ill-defined desires for
pg_stat_statements aggregation behavior, I'm not really eager to buy
into the same definitional morass at the plan level.

            regards, tom lane



Re: Does Postgres have consistent identifiers (plan hash value) for explain plans?

From
Jerry Brenner
Date:
Apologies if I'm not using the appropriate Postgres-specific terms.  The simplest implementation would consider only the information that is consistent across systems and executions - node types, relation and index names (not ids), aliases ...  It would ignore bind variable values and execution statistics.  This would make it easy to find:
  • Queries with different plans (we would then investigate the causes)
  • Queries where the plans changed over time (we would then investigate the causes)
  • The aggregated cost of a query plan shared by multiple queries (due to differences in the number of items in an IN list, ...) This is useful when no single query in the group registers high, but the group of queries registers high.
  • Because we use consistent table and index names across environments, this would make it possible to find queries with different plans in different environments.
In a perfect world, the hash would include the filters, index conds, ... with the constant values masked out, but I realize that's much more complicated.  (Without this, we could see plans that are applying different numbers of predicates against an index mapping to the same hash value, but it would still be a big improvement.)

As mentioned before, we are currently storing the explain plans in a database table.  We use a combination of columns and syntax when querying for the long executions to display some contextual information about the explain plans (does it have a Materialize node, Init Plan, Sub Plan, ...)  Based on that little contextual information, we can see that there are multiple plans for some queries. Based on manual investigation, we know that there other plan differences.  It is very expensive right now to try to figure out if a plan is changing over time, why some executions are more expensive than others, ...  

Thanks,
Jerry

On Mon, Dec 4, 2023 at 7:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Dec 04, 2023 at 09:57:24AM -0500, Tom Lane wrote:
>> Jerry Brenner <jbrenner@guidewire.com> writes:
>>> Both Oracle and SQL Server have
>>> consistent hash values for query plans and that makes it easy to identify
>>> when there are multiple plans for the same query.  Does that concept exist
>>> in later releases of Postgres (and is the value stored in the json explain
>>> plan)?

>> No, there's no support currently for obtaining a hash value that's
>> associated with a plan rather than an input query tree.

> PlannerGlobal includes a no_query_jumble that gets inherited by all
> its lower-level nodes, so adding support for hashes compiled from
> these node structures would not be that complicated.  My point is that
> the basic infrastructure is in place in the tree to be able to do
> that, and it should not be a problem to even publish the compiled
> hashes in EXPLAIN outputs, behind an option of course.

Well, yeah, we could fairly easily activate that infrastructure for
plans, but we haven't.  More to the point, it's not clear to me that
that would satisfy the OP's request for "consistent" hash values.
The hashes would vary depending on object OID values, system version,
possibly endianness, etc.

I'm also wondering exactly what the OP thinks qualifies as different
plans.  Remembering the amount of fooling-around that's gone on with
querytree hashes to satisfy various people's ill-defined desires for
pg_stat_statements aggregation behavior, I'm not really eager to buy
into the same definitional morass at the plan level.

                        regards, tom lane