Re: allowing extensions to control planner behavior - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: allowing extensions to control planner behavior
Date
Msg-id aaf4250a-77db-43e3-9720-f60e2e4f5eba@gmail.com
Whole thread Raw
Responses Re: allowing extensions to control planner behavior
List pgsql-hackers
On 26/8/2024 18:32, Robert Haas wrote:
> I'm somewhat expecting to be flamed to a well-done crisp for saying
> this, but I think we need better ways for extensions to control the
> behavior of PostgreSQL's query planner. I know of two major reasons
It is the change I have been waiting for a long time. Remember how many 
kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written 
for only the reason of a small number of hooks - I guess many other 
people could cheer such work.

> why somebody might want to do this. First, you might want to do
> something like what pg_hint_plan does, where it essentially implements
> Oracle-style hints that can be either inline or stored in a side table
> and automatically applied to queries.[1] In addition to supporting
> Oracle-style hints, it also supports some other kinds of hints so that
> you can, for example, try to fix broken cardinality estimates. Second,
My personal most wanted list:
- Selectivity list estimation hook
- Groups number estimation hook
- hooks on memory estimations, involving work_mem
- add_path() hook
- Hook on final RelOptInfo pathlist
- a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query 
structures
- Extensibility of extended and plain statistics
- Hook on portal error processing
- Canonicalise expressions hook

> you might want to convince the planner to keep producing the same kind
> of plan that it produced previously. I believe this is what Amazon's
> query plan management feature[2] does, although since it is closed
> source and I don't work at Amazon maybe it's actually implemented
> completely differently. Regardless of what Amazon did in this case,
> plan stability is a feature people want. Just trying to keep using the
> same plan data structure forever doesn't seem like a good strategy,
> because for example it would be fragile in the case of any DDL
> changes, like dropping and recreating an index, or dropping or adding
As a designer of plan freezing feature [1] I can say it utilises 
plancache and, being under its invalidation callbacks it doesn't afraid 
DDL or any other stuff altering database objects.

> Unfortunately, the part about the hook having the freedom to delete
> paths isn't really true. Perhaps technically you can delete a path
> that you don't want to be chosen, but any paths that were dominated by
> the path you deleted have already been thrown away and it's too late
> to get them back. You can modify paths if you don't want to change
> their costs, but if you change their costs then you have the same
> problem: the contents of the pathlist at the time that you see it are
> determined by the costs that each path had when it was initially
> added, and it's really too late to editorialize on that. So all you
> can really do here in practice is add new paths.
 From my standpoint, it is enough to export routines creating paths and 
calculating costs.

> set_join_pathlist_hook, which applies to joinrels, is similarly
> limited. appendrels don't even have an equivalent of this hook.
> 
> So, how could we do better?
> 
> I think there are two basic approaches that are possible here. If
> someone sees a third option, let me know. First, we could allow users
> to hook add_path() and add_partial_path(). That certainly provides the
> flexibility on paper to accept or reject whatever paths you do or do
+1

> The attached patch, briefly mentioned above, essentially converts the
> enable_* GUCs into RelOptInfo properties where the defaults are set by
> the corresponding GUCs. The idea is that a hook could then change this
> on a per-RelOptInfo basis before path generation happens. For
IMO, it is better not to switch on/off algorithms, but allow extensions 
to change their cost multipliers, modifying costs balance. 10E9 looks 
like a disable, but multiplier == 10 for a cost node just provide more 
freedom for hashing strategies.

[1] https://postgrespro.com/docs/enterprise/16/sr-plan

-- 
regards, Andrei Lepikhov




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: thread-safety: getpwuid_r()
Next
From: Ayush Vatsa
Date:
Subject: Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch