Thread: Hook for Selectivity Estimation in Query Planning

Hook for Selectivity Estimation in Query Planning

From
Andrei Lepikhov
Date:
Hi,

I would like to discuss the introduction of a hook for evaluating the 
selectivity of an expression when searching for an optimal query plan. 
This topic has been brought up in various discussions, for example, in [1].

Currently, extensions that interact with the optimiser can only add 
their paths without the ability to influence the optimiser's decisions. 
As a result, when developing an extension that implements a new type of 
statistics (such as a histogram for composite types), utilises knowledge 
from previously executed queries, or implements some system of 
selectivity hints, we find ourselves writing a considerable amount of 
code. To ensure the reliable operation of the extension, this may end up 
in developing a separate optimiser or, at the very least, creating a 
custom join search (refer to core.c in the pg_hint_plan extension for an 
estimation of the amount of code required).

A hook for evaluating selectivity could streamline the development of 
methods to improve selectivity evaluation, making it easier to create 
new types of statistics and estimation methods (I would like to deal 
with join clauses estimation). Considering the limited amount of code 
involved and the upcoming code freeze, I propose adding such a hook to 
PostgreSQL 18 to assess how it simplifies extension development.

This proposed hook would complement the existing path hooks without 
overlapping in functionality. In my experience with implementing 
adaptive features in enterprise solutions, I believe that additional 
hooks could also be beneficial for estimating the number of groups and 
the amount of memory allocated, which is currently based solely on 
work_mem. However, these suggestions do not interfere with the current 
proposal and could be considered later.

Critique:
In general, a hook for evaluating the number of rows appears to be a 
more promising approach. It would allow the extension to access specific 
RelOptInfo data, thus providing insights into where the evaluation takes 
place within the plan. Consequently, this would enable a deeper 
influence on the query plan choice. However, implementing such a hook 
might be more invasive, requiring modifications to each cost function. 
Additionally, it addresses a slightly different issue and can be 
considered separately.

Attached is a patch containing the proposed hook code.

-- 
regards, Andrei Lepikhov

Attachment

Re: Hook for Selectivity Estimation in Query Planning

From
Aleksander Alekseev
Date:
Hi,

> I would like to discuss the introduction of a hook for evaluating the
> selectivity of an expression when searching for an optimal query plan.
> This topic has been brought up in various discussions, for example, in [1].
>
> [...]

As I vaguely recall recent proposals like this ("Pluggable TOASTer" to
name one) this approach was criticised. Hooks per se don't add value
for the end user. They only put the burden of maintaining them on the
community while all the real features are implemented in proprietar
extensions. If you believe something is missing in Postgres,
contribute it to the upstream so that anyone will benefit from it.

Personally I agree with the sentiment, thus -1.

-- 
Best regards,
Aleksander Alekseev



Re: Hook for Selectivity Estimation in Query Planning

From
Matthias van de Meent
Date:
On Wed, 5 Mar 2025 at 14:30, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > I would like to discuss the introduction of a hook for evaluating the
> > selectivity of an expression when searching for an optimal query plan.
> > This topic has been brought up in various discussions, for example, in [1].
> >
> > [...]
>
> As I vaguely recall recent proposals like this ("Pluggable TOASTer" to
> name one) this approach was criticised.

Could you explain why you think the Pluggable TOASTer proposal was similar?

IIRC, the Pluggable TOASTer patch added hooks in specific types with
the intent to allow specific well-defined actions that were supposed
to be supported by each type to be implemented by an extension; where
"well-defined" does some heavy lifting for "practically, there's only
one way to implement this for each type". That was why my final
response for that patch was that the extensibility part didn't make
sense - if there's practically only a single implementation based on
the API, then why is there an API at all; let alone so deeply
ingrained in the type-specific functions?

The patch proposed here, however, does not look like that to me. As a
start, it doesn't require catalog changes for the hook to be used, and
isn't specific to a single type.

Kind regards,

Matthias van de Meent



Re: Hook for Selectivity Estimation in Query Planning

From
Andrei Lepikhov
Date:
On 5/3/2025 14:29, Aleksander Alekseev wrote:
> Hi,
> 
>> I would like to discuss the introduction of a hook for evaluating the
>> selectivity of an expression when searching for an optimal query plan.
>> This topic has been brought up in various discussions, for example, in [1].
>>
>> [...]
> 
> As I vaguely recall recent proposals like this ("Pluggable TOASTer" to
> name one) this approach was criticised. Hooks per se don't add value
> for the end user. They only put the burden of maintaining them on the
> community while all the real features are implemented in proprietar
> extensions. If you believe something is missing in Postgres,
> contribute it to the upstream so that anyone will benefit from it.
At first, I didn't find the reason for hooks' current existence in the 
core. However, it's clear that hooks speed up the development of 
extensions, which in turn enhances usability and popularity of the 
project. This leads to a greater number of use cases and tests, 
fostering community growth. I'm not sure what the purpose of the project 
is except curiosity, but even then, extensions speed up the idea 
validation process, don't they?
It's important to remember that not all extensions are proprietary. Does 
TimescaleDB not provide value to both end users and the community?

Furthermore, extensions are necessary to address gaps that the community 
may not work on by definition; for example, consider pg_hint_plan.

As I mentioned, the primary purpose of the hook is clear: to advance the 
development of alternative statistics and estimation methods. For 
instance, I've already come across proposals for multidimensional 
histograms. Personally, I want to use this hook to implement zonal 
ndistinct statistic extension to address the intra-column data skew issue.

Overall, I see that new hooks allow new [sometimes] open-source projects 
and startups to emerge - not sure about enterprises' benefits. 
Therefore, I'm not convinced by your current justification. Are there 
any technical objections?

-- 
regards, Andrei Lepikhov



Re: Hook for Selectivity Estimation in Query Planning

From
Aleksander Alekseev
Date:
Andrei, Matthias,

> Could you explain why you think the Pluggable TOASTer proposal was similar?
> [...]

I merely pointed out that adding hooks without any particular value
for the Postgres users was criticized before, see for instance:

https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql

One could argue - but wait, isn't TAM for instance just a bunch of
hooks in a nutshell? How do we distinguish a well-documented and more
or less stable API for the extension authors from a random hook put in
a convenient place? That's a good question. I don't have an answer to
it. This being said, the proposed patch doesn't strike me as a good or
documented API, or the one that is going to be stable in the long run.

> [...]
>
> Overall, I see that new hooks allow new [sometimes] open-source projects
> and startups to emerge - not sure about enterprises' benefits.
> Therefore, I'm not convinced by your current justification. Are there
> any technical objections?

There is no point in debating about good and evil or right and wrong.
The only important question is whether there will be a committer
willing to accept the proposed change considering its controversy.

-- 
Best regards,
Aleksander Alekseev



Re: Hook for Selectivity Estimation in Query Planning

From
Andrei Lepikhov
Date:
On 5/3/2025 19:50, Aleksander Alekseev wrote:
> Andrei, Matthias,
> 
>> Could you explain why you think the Pluggable TOASTer proposal was similar?
>> [...]
> 
> I merely pointed out that adding hooks without any particular value
> for the Postgres users was criticized before, see for instance:
Thank you for your feedback. Rational criticism is always welcome. Let’s 
aim to clarify the actual objectives:
> 
> https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql
> 
> One could argue - but wait, isn't TAM for instance just a bunch of
> hooks in a nutshell? How do we distinguish a well-documented and more
> or less stable API for the extension authors from a random hook put in
> a convenient place? That's a good question. I don't have an answer to
> it. This being said, the proposed patch doesn't strike me as a good or
> documented API, or the one that is going to be stable in the long run.
1. **Documentation** - Agreed. I think it's feasible to create 
documentation based on the examples. However, we should first decide on 
the main subject, don't you think?

2. **'Good API'** - I wouldn't say that makes sense. Could you clarify 
what you mean by "good API"? What qualifies as a good API, why do you 
feel that the current changes are bad, and how can we improve it?

3. **'Stable'** - Why do you believe it is unstable? As I mentioned, 
this is the first hook that allows us to influence the optimiser's 
behaviour. Current path hooks only allow us to provide the planner with 
alternative decisions and force us to think it knows better how to 
proceed. I suggest we enable developers to enhance prediction quality 
without having to create a new planner. The rationale behind this is 
quite clear — specific workloads may require more sophisticated 
estimation algorithms, which would be excessive for a general-purpose 
planner.

As you can imagine, I would like to hook into cardinality predictions or 
tweak cost functions (see Apache Calcite), but that approach is invasive 
and unstable since each node, whether existing or newly introduced, 
would require such a call. In contrast, the selectivity estimation 
function serves as a central point for estimations, necessitating only 
one call. I believe we could consider adding a reference to `RelOptInfo` 
in the future, as has been briefly mentioned in discussions among 
developers before. For now, though, this seems sufficient for the 
purpose of database statistics.
> 
>> [...]
>>
>> Overall, I see that new hooks allow new [sometimes] open-source projects
>> and startups to emerge - not sure about enterprises' benefits.
>> Therefore, I'm not convinced by your current justification. Are there
>> any technical objections?
> 
> There is no point in debating about good and evil or right and wrong.
> The only important question is whether there will be a committer
> willing to accept the proposed change considering its controversy.
It would be interesting to see what type of controversy you see here. I 
think it will be clearer after you answer the previous questions.

-- 
regards, Andrei Lepikhov