Thread: Re: allowing extensions to control planner behavior
On 26/8/2024 18:32, Robert Haas wrote: > I'm somewhat expecting to be flamed to a well-done crisp for saying > this, but I think we need better ways for extensions to control the > behavior of PostgreSQL's query planner. I know of two major reasons It is the change I have been waiting for a long time. Remember how many kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written for only the reason of a small number of hooks - I guess many other people could cheer such work. > why somebody might want to do this. First, you might want to do > something like what pg_hint_plan does, where it essentially implements > Oracle-style hints that can be either inline or stored in a side table > and automatically applied to queries.[1] In addition to supporting > Oracle-style hints, it also supports some other kinds of hints so that > you can, for example, try to fix broken cardinality estimates. Second, My personal most wanted list: - Selectivity list estimation hook - Groups number estimation hook - hooks on memory estimations, involving work_mem - add_path() hook - Hook on final RelOptInfo pathlist - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query structures - Extensibility of extended and plain statistics - Hook on portal error processing - Canonicalise expressions hook > you might want to convince the planner to keep producing the same kind > of plan that it produced previously. I believe this is what Amazon's > query plan management feature[2] does, although since it is closed > source and I don't work at Amazon maybe it's actually implemented > completely differently. Regardless of what Amazon did in this case, > plan stability is a feature people want. Just trying to keep using the > same plan data structure forever doesn't seem like a good strategy, > because for example it would be fragile in the case of any DDL > changes, like dropping and recreating an index, or dropping or adding As a designer of plan freezing feature [1] I can say it utilises plancache and, being under its invalidation callbacks it doesn't afraid DDL or any other stuff altering database objects. > Unfortunately, the part about the hook having the freedom to delete > paths isn't really true. Perhaps technically you can delete a path > that you don't want to be chosen, but any paths that were dominated by > the path you deleted have already been thrown away and it's too late > to get them back. You can modify paths if you don't want to change > their costs, but if you change their costs then you have the same > problem: the contents of the pathlist at the time that you see it are > determined by the costs that each path had when it was initially > added, and it's really too late to editorialize on that. So all you > can really do here in practice is add new paths. From my standpoint, it is enough to export routines creating paths and calculating costs. > set_join_pathlist_hook, which applies to joinrels, is similarly > limited. appendrels don't even have an equivalent of this hook. > > So, how could we do better? > > I think there are two basic approaches that are possible here. If > someone sees a third option, let me know. First, we could allow users > to hook add_path() and add_partial_path(). That certainly provides the > flexibility on paper to accept or reject whatever paths you do or do +1 > The attached patch, briefly mentioned above, essentially converts the > enable_* GUCs into RelOptInfo properties where the defaults are set by > the corresponding GUCs. The idea is that a hook could then change this > on a per-RelOptInfo basis before path generation happens. For IMO, it is better not to switch on/off algorithms, but allow extensions to change their cost multipliers, modifying costs balance. 10E9 looks like a disable, but multiplier == 10 for a cost node just provide more freedom for hashing strategies. [1] https://postgrespro.com/docs/enterprise/16/sr-plan -- regards, Andrei Lepikhov
On Mon, Aug 26, 2024 at 2:00 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > It is the change I have been waiting for a long time. Remember how many > kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written > for only the reason of a small number of hooks - I guess many other > people could cheer such work. I think so, too. I know there are going to be people who hate this, but I think the cat is already out of the bag. It's not a question any more of whether it will happen, it's just a question of whether we want to collaborate with extension developers or try to make their life difficult. > My personal most wanted list: > - Selectivity list estimation hook > - Groups number estimation hook > - hooks on memory estimations, involving work_mem > - add_path() hook > - Hook on final RelOptInfo pathlist > - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query > structures > - Extensibility of extended and plain statistics > - Hook on portal error processing > - Canonicalise expressions hook One of my chronic complaints about hooks is that people propose hooks that are just in any random spot in the code where they happen to want to change something. If we accept that, we end up with a lot of hooks where nobody can say how the hook can be used usefully and maybe it can't actually be used usefully even by the original author, or only them and nobody else. So these kinds of proposals need detailed, case-by-case scrutiny. It's unacceptable for the planner to get filled up with a bunch of poorly-designed hooks just as it is for any other part of the system, but well-designed hooks whose usefulness can clearly be seen should be just as welcome here as anywhere else. > IMO, it is better not to switch on/off algorithms, but allow extensions > to change their cost multipliers, modifying costs balance. 10E9 looks > like a disable, but multiplier == 10 for a cost node just provide more > freedom for hashing strategies. That may be a valid use case, but I do not think it is a typical use case. In my experience, when people want to force the planner to do something, they really mean it. They don't mean "please do it this way unless you really, really don't feel like it." They mean "please do it this way, period." And that is also what other systems provide. Oracle could provide a hint MERGE_COST(foo,10) meaning make merge joins look ten times as expensive but in fact they only provide MERGE and NO_MERGE. And a "reproduce this previous plan" feature really demands infrastructure that truly forces the planner to do what it's told, rather than just nicely suggesting that it might want to do as it's told. I wouldn't be sad at all if we happen to end up with a system that's powerful enough for an extension to implement "make merge joins ten times as expensive"; in fact, I think that would be pretty cool. But I don't think it should be the design center for what we implement, because it looks nothing like what existing PG or non-PG systems do, at least in my experience. -- Robert Haas EDB: http://www.enterprisedb.com
On 26/8/2024 21:44, Robert Haas wrote: > On Mon, Aug 26, 2024 at 2:00 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> My personal most wanted list: >> - Selectivity list estimation hook >> - Groups number estimation hook >> - hooks on memory estimations, involving work_mem >> - add_path() hook >> - Hook on final RelOptInfo pathlist >> - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query >> structures >> - Extensibility of extended and plain statistics >> - Hook on portal error processing >> - Canonicalise expressions hook > > One of my chronic complaints about hooks is that people propose hooks > that are just in any random spot in the code where they happen to want > to change something. If we accept that, we end up with a lot of hooks > where nobody can say how the hook can be used usefully and maybe it > can't actually be used usefully even by the original author, or only > them and nobody else. So these kinds of proposals need detailed, > case-by-case scrutiny. It's unacceptable for the planner to get filled > up with a bunch of poorly-designed hooks just as it is for any other > part of the system, but well-designed hooks whose usefulness can > clearly be seen should be just as welcome here as anywhere else. Definitely so. Think about that as a sketch proposal on the roadmap. Right now, I know about only one hook - selectivity hook - which we already discussed and have Tomas Vondra's patch on the table. But even this is a big deal, because multi-clause estimations are a huge pain for users that can't be resolved with extensions for now without core patches. >> IMO, it is better not to switch on/off algorithms, but allow extensions >> to change their cost multipliers, modifying costs balance. 10E9 looks >> like a disable, but multiplier == 10 for a cost node just provide more >> freedom for hashing strategies. > > That may be a valid use case, but I do not think it is a typical use > case. In my experience, when people want to force the planner to do > something, they really mean it. They don't mean "please do it this way > unless you really, really don't feel like it." They mean "please do it > this way, period." And that is also what other systems provide. Oracle > could provide a hint MERGE_COST(foo,10) meaning make merge joins look > ten times as expensive but in fact they only provide MERGE and > NO_MERGE. And a "reproduce this previous plan" feature really demands > infrastructure that truly forces the planner to do what it's told, > rather than just nicely suggesting that it might want to do as it's > told. I wouldn't be sad at all if we happen to end up with a system > that's powerful enough for an extension to implement "make merge joins > ten times as expensive"; in fact, I think that would be pretty cool. > But I don't think it should be the design center for what we > implement, because it looks nothing like what existing PG or non-PG > systems do, at least in my experience. Heh, I meant not manual usage, but automatical one, provided by extensions. -- regards, Andrei Lepikhov