Thread: Hook for Selectivity Estimation in Query Planning
Hi, I would like to discuss the introduction of a hook for evaluating the selectivity of an expression when searching for an optimal query plan. This topic has been brought up in various discussions, for example, in [1]. Currently, extensions that interact with the optimiser can only add their paths without the ability to influence the optimiser's decisions. As a result, when developing an extension that implements a new type of statistics (such as a histogram for composite types), utilises knowledge from previously executed queries, or implements some system of selectivity hints, we find ourselves writing a considerable amount of code. To ensure the reliable operation of the extension, this may end up in developing a separate optimiser or, at the very least, creating a custom join search (refer to core.c in the pg_hint_plan extension for an estimation of the amount of code required). A hook for evaluating selectivity could streamline the development of methods to improve selectivity evaluation, making it easier to create new types of statistics and estimation methods (I would like to deal with join clauses estimation). Considering the limited amount of code involved and the upcoming code freeze, I propose adding such a hook to PostgreSQL 18 to assess how it simplifies extension development. This proposed hook would complement the existing path hooks without overlapping in functionality. In my experience with implementing adaptive features in enterprise solutions, I believe that additional hooks could also be beneficial for estimating the number of groups and the amount of memory allocated, which is currently based solely on work_mem. However, these suggestions do not interfere with the current proposal and could be considered later. Critique: In general, a hook for evaluating the number of rows appears to be a more promising approach. It would allow the extension to access specific RelOptInfo data, thus providing insights into where the evaluation takes place within the plan. Consequently, this would enable a deeper influence on the query plan choice. However, implementing such a hook might be more invasive, requiring modifications to each cost function. Additionally, it addresses a slightly different issue and can be considered separately. Attached is a patch containing the proposed hook code. -- regards, Andrei Lepikhov
Attachment
Hi, > I would like to discuss the introduction of a hook for evaluating the > selectivity of an expression when searching for an optimal query plan. > This topic has been brought up in various discussions, for example, in [1]. > > [...] As I vaguely recall recent proposals like this ("Pluggable TOASTer" to name one) this approach was criticised. Hooks per se don't add value for the end user. They only put the burden of maintaining them on the community while all the real features are implemented in proprietar extensions. If you believe something is missing in Postgres, contribute it to the upstream so that anyone will benefit from it. Personally I agree with the sentiment, thus -1. -- Best regards, Aleksander Alekseev
On Wed, 5 Mar 2025 at 14:30, Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > > I would like to discuss the introduction of a hook for evaluating the > > selectivity of an expression when searching for an optimal query plan. > > This topic has been brought up in various discussions, for example, in [1]. > > > > [...] > > As I vaguely recall recent proposals like this ("Pluggable TOASTer" to > name one) this approach was criticised. Could you explain why you think the Pluggable TOASTer proposal was similar? IIRC, the Pluggable TOASTer patch added hooks in specific types with the intent to allow specific well-defined actions that were supposed to be supported by each type to be implemented by an extension; where "well-defined" does some heavy lifting for "practically, there's only one way to implement this for each type". That was why my final response for that patch was that the extensibility part didn't make sense - if there's practically only a single implementation based on the API, then why is there an API at all; let alone so deeply ingrained in the type-specific functions? The patch proposed here, however, does not look like that to me. As a start, it doesn't require catalog changes for the hook to be used, and isn't specific to a single type. Kind regards, Matthias van de Meent
On 5/3/2025 14:29, Aleksander Alekseev wrote: > Hi, > >> I would like to discuss the introduction of a hook for evaluating the >> selectivity of an expression when searching for an optimal query plan. >> This topic has been brought up in various discussions, for example, in [1]. >> >> [...] > > As I vaguely recall recent proposals like this ("Pluggable TOASTer" to > name one) this approach was criticised. Hooks per se don't add value > for the end user. They only put the burden of maintaining them on the > community while all the real features are implemented in proprietar > extensions. If you believe something is missing in Postgres, > contribute it to the upstream so that anyone will benefit from it. At first, I didn't find the reason for hooks' current existence in the core. However, it's clear that hooks speed up the development of extensions, which in turn enhances usability and popularity of the project. This leads to a greater number of use cases and tests, fostering community growth. I'm not sure what the purpose of the project is except curiosity, but even then, extensions speed up the idea validation process, don't they? It's important to remember that not all extensions are proprietary. Does TimescaleDB not provide value to both end users and the community? Furthermore, extensions are necessary to address gaps that the community may not work on by definition; for example, consider pg_hint_plan. As I mentioned, the primary purpose of the hook is clear: to advance the development of alternative statistics and estimation methods. For instance, I've already come across proposals for multidimensional histograms. Personally, I want to use this hook to implement zonal ndistinct statistic extension to address the intra-column data skew issue. Overall, I see that new hooks allow new [sometimes] open-source projects and startups to emerge - not sure about enterprises' benefits. Therefore, I'm not convinced by your current justification. Are there any technical objections? -- regards, Andrei Lepikhov
Andrei, Matthias, > Could you explain why you think the Pluggable TOASTer proposal was similar? > [...] I merely pointed out that adding hooks without any particular value for the Postgres users was criticized before, see for instance: https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql One could argue - but wait, isn't TAM for instance just a bunch of hooks in a nutshell? How do we distinguish a well-documented and more or less stable API for the extension authors from a random hook put in a convenient place? That's a good question. I don't have an answer to it. This being said, the proposed patch doesn't strike me as a good or documented API, or the one that is going to be stable in the long run. > [...] > > Overall, I see that new hooks allow new [sometimes] open-source projects > and startups to emerge - not sure about enterprises' benefits. > Therefore, I'm not convinced by your current justification. Are there > any technical objections? There is no point in debating about good and evil or right and wrong. The only important question is whether there will be a committer willing to accept the proposed change considering its controversy. -- Best regards, Aleksander Alekseev
On 5/3/2025 19:50, Aleksander Alekseev wrote: > Andrei, Matthias, > >> Could you explain why you think the Pluggable TOASTer proposal was similar? >> [...] > > I merely pointed out that adding hooks without any particular value > for the Postgres users was criticized before, see for instance: Thank you for your feedback. Rational criticism is always welcome. Let’s aim to clarify the actual objectives: > > https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql > > One could argue - but wait, isn't TAM for instance just a bunch of > hooks in a nutshell? How do we distinguish a well-documented and more > or less stable API for the extension authors from a random hook put in > a convenient place? That's a good question. I don't have an answer to > it. This being said, the proposed patch doesn't strike me as a good or > documented API, or the one that is going to be stable in the long run. 1. **Documentation** - Agreed. I think it's feasible to create documentation based on the examples. However, we should first decide on the main subject, don't you think? 2. **'Good API'** - I wouldn't say that makes sense. Could you clarify what you mean by "good API"? What qualifies as a good API, why do you feel that the current changes are bad, and how can we improve it? 3. **'Stable'** - Why do you believe it is unstable? As I mentioned, this is the first hook that allows us to influence the optimiser's behaviour. Current path hooks only allow us to provide the planner with alternative decisions and force us to think it knows better how to proceed. I suggest we enable developers to enhance prediction quality without having to create a new planner. The rationale behind this is quite clear — specific workloads may require more sophisticated estimation algorithms, which would be excessive for a general-purpose planner. As you can imagine, I would like to hook into cardinality predictions or tweak cost functions (see Apache Calcite), but that approach is invasive and unstable since each node, whether existing or newly introduced, would require such a call. In contrast, the selectivity estimation function serves as a central point for estimations, necessitating only one call. I believe we could consider adding a reference to `RelOptInfo` in the future, as has been briefly mentioned in discussions among developers before. For now, though, this seems sufficient for the purpose of database statistics. > >> [...] >> >> Overall, I see that new hooks allow new [sometimes] open-source projects >> and startups to emerge - not sure about enterprises' benefits. >> Therefore, I'm not convinced by your current justification. Are there >> any technical objections? > > There is no point in debating about good and evil or right and wrong. > The only important question is whether there will be a committer > willing to accept the proposed change considering its controversy. It would be interesting to see what type of controversy you see here. I think it will be clearer after you answer the previous questions. -- regards, Andrei Lepikhov