Re: allowing extensions to control planner behavior - Mailing list pgsql-hackers
From | Andrei Lepikhov |
---|---|
Subject | Re: allowing extensions to control planner behavior |
Date | |
Msg-id | aaf4250a-77db-43e3-9720-f60e2e4f5eba@gmail.com Whole thread Raw |
Responses |
Re: allowing extensions to control planner behavior
|
List | pgsql-hackers |
On 26/8/2024 18:32, Robert Haas wrote: > I'm somewhat expecting to be flamed to a well-done crisp for saying > this, but I think we need better ways for extensions to control the > behavior of PostgreSQL's query planner. I know of two major reasons It is the change I have been waiting for a long time. Remember how many kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written for only the reason of a small number of hooks - I guess many other people could cheer such work. > why somebody might want to do this. First, you might want to do > something like what pg_hint_plan does, where it essentially implements > Oracle-style hints that can be either inline or stored in a side table > and automatically applied to queries.[1] In addition to supporting > Oracle-style hints, it also supports some other kinds of hints so that > you can, for example, try to fix broken cardinality estimates. Second, My personal most wanted list: - Selectivity list estimation hook - Groups number estimation hook - hooks on memory estimations, involving work_mem - add_path() hook - Hook on final RelOptInfo pathlist - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query structures - Extensibility of extended and plain statistics - Hook on portal error processing - Canonicalise expressions hook > you might want to convince the planner to keep producing the same kind > of plan that it produced previously. I believe this is what Amazon's > query plan management feature[2] does, although since it is closed > source and I don't work at Amazon maybe it's actually implemented > completely differently. Regardless of what Amazon did in this case, > plan stability is a feature people want. Just trying to keep using the > same plan data structure forever doesn't seem like a good strategy, > because for example it would be fragile in the case of any DDL > changes, like dropping and recreating an index, or dropping or adding As a designer of plan freezing feature [1] I can say it utilises plancache and, being under its invalidation callbacks it doesn't afraid DDL or any other stuff altering database objects. > Unfortunately, the part about the hook having the freedom to delete > paths isn't really true. Perhaps technically you can delete a path > that you don't want to be chosen, but any paths that were dominated by > the path you deleted have already been thrown away and it's too late > to get them back. You can modify paths if you don't want to change > their costs, but if you change their costs then you have the same > problem: the contents of the pathlist at the time that you see it are > determined by the costs that each path had when it was initially > added, and it's really too late to editorialize on that. So all you > can really do here in practice is add new paths. From my standpoint, it is enough to export routines creating paths and calculating costs. > set_join_pathlist_hook, which applies to joinrels, is similarly > limited. appendrels don't even have an equivalent of this hook. > > So, how could we do better? > > I think there are two basic approaches that are possible here. If > someone sees a third option, let me know. First, we could allow users > to hook add_path() and add_partial_path(). That certainly provides the > flexibility on paper to accept or reject whatever paths you do or do +1 > The attached patch, briefly mentioned above, essentially converts the > enable_* GUCs into RelOptInfo properties where the defaults are set by > the corresponding GUCs. The idea is that a hook could then change this > on a per-RelOptInfo basis before path generation happens. For IMO, it is better not to switch on/off algorithms, but allow extensions to change their cost multipliers, modifying costs balance. 10E9 looks like a disable, but multiplier == 10 for a cost node just provide more freedom for hashing strategies. [1] https://postgrespro.com/docs/enterprise/16/sr-plan -- regards, Andrei Lepikhov
pgsql-hackers by date: