At 2024-08-27 00:32:53, "Robert Haas" <robertmhaas@gmail.com> wrote:
>I'm somewhat expecting to be flamed to a well-done crisp for saying
>this, but I think we need better ways for extensions to control the
>behavior of PostgreSQL's query planner. I know of two major reasons
>why somebody might want to do this. First, you might want to do
>something like what pg_hint_plan does, where it essentially implements
>Oracle-style hints that can be either inline or stored in a side table
>and automatically applied to queries.[1] In addition to supporting
>Oracle-style hints, it also supports some other kinds of hints so that
>you can, for example, try to fix broken cardinality estimates. Second,
>you might want to convince the planner to keep producing the same kind
>of plan that it produced previously. I believe this is what Amazon's
>query plan management feature[2] does, although since it is closed
>source and I don't work at Amazon maybe it's actually implemented
>completely differently. Regardless of what Amazon did in this case,
>plan stability is a feature people want. Just trying to keep using the
>same plan data structure forever doesn't seem like a good strategy,
>because for example it would be fragile in the case of any DDL
>changes, like dropping and recreating an index, or dropping or adding
>a column. But you might want conceptually the same plan. Although it's
>not frequently admitted on this mailing list, unexpected plan changes
>are a frequent cause of sudden database outages, and wanting to
>prevent that is a legitimate thing for a user to try to do. Naturally,
>there is a risk that you might in so doing also prevent plan changes
>that would have dramatically improved performance, or stick with a
>plan long after you've outgrown it, but that doesn't stop people from
>wanting it, or other databases (or proprietary forks of this database)
>from offering it, and I don't think it should.
>
>We have some hooks right now that offer a few options in this area,
>but there are problems. The hook that I believe to be closest to the
>right thing is this one:
>
> /*
> * Allow a plugin to editorialize on the set of Paths for this base
> * relation. It could add new paths (such as CustomPaths) by calling
> * add_path(), or add_partial_path() if parallel aware. It could also
> * delete or modify paths added by the core code.
> */
> if (set_rel_pathlist_hook)
> (*set_rel_pathlist_hook) (root, rel, rti, rte);
>
>Unfortunately, the part about the hook having the freedom to delete
>paths isn't really true. Perhaps technically you can delete a path
>that you don't want to be chosen, but any paths that were dominated by
>the path you deleted have already been thrown away and it's too late
>to get them back. You can modify paths if you don't want to change
>their costs, but if you change their costs then you have the same
>problem: the contents of the pathlist at the time that you see it are
>determined by the costs that each path had when it was initially
>added, and it's really too late to editorialize on that. So all you
>can really do here in practice is add new paths.
>set_join_pathlist_hook, which applies to joinrels, is similarly
>limited. appendrels don't even have an equivalent of this hook.
>
>So, how could we do better?
>
>I think there are two basic approaches that are possible here. If
>someone sees a third option, let me know. First, we could allow users
>to hook add_path() and add_partial_path(). That certainly provides the
>flexibility on paper to accept or reject whatever paths you do or do
>not want. However, I don't find this approach very appealing. One
>problem is that it's likely to be fairly expensive, because add_path()
>gets called A LOT. A second problem is that you don't actually get an
>awful lot of context: I think anybody writing a hook would have to
>write code to basically analyze each proposed path and figure out why
>it was getting added and then decide what to do. In some cases that
>might be fine, because for example accepting or rejecting paths based
>on path type seems fairly straightforward with this approach, but as
>soon as you want to do anything more complicated than that it starts
>to seem difficult. If, for example, you want relation R1 to be the
>driving table for the whole query plan, you're going to have to
>determine whether or not that is the case for every single candidate
>(partial) path that someone hands you, so you're going to end up
>making that decision a whole lot of times. It doesn't sound
>particularly fun. Third, even if you are doing something really simple
>like trying to reject mergejoins, you've already lost the opportunity
>to skip a bunch of work. If you had known when you started planning
>the joinrel that you didn't care about mergejoins, you could have
>skipped looking for merge-joinable clauses. Overall, while I wouldn't
>be completely against further exploration of this option, I suspect
>it's pretty hard to do anything useful with it.
>
>The other possible approach is to allow extensions to feed some
>information into the planner before path generation and let that
>influence which paths are generated. This is essentially what
>pg_hint_plan is doing: it implements plan type hints by arranging to
>flip the various enable_* GUCs on and off during the planning of
>various rels. That's clever but ugly, and it ends up duplicating
>substantial chunks of planner code due to the inadequacy of the
>existing hooks. With some refactoring and some additional hooks, we
>could make this much less ugly. But that gets at what I believe to be
>the core difficulty of this approach, which is that the core planner
>code needs to be somewhat aware of and on board with what the user or
>the extension is trying to do. If an extension wants to force the join
>order, that is likely to require different scaffolding than if it
>wants to force the join methods which is again different from if a
>hook wants to bias the query planner towards or against particular
>indexes. Putting in hooks or other infrastructure that allows an
>extension to control a particular aspect of planner behavior is to
>some extent an endorsement of controlling the planner behavior in that
>particular way. Since any amount of allowing the user to control the
>planner tends to be controversial around here, that opens up the
>spectre of putting a whole lot of effort into arguing about which
>things extensions should be allowed to do, getting most of the patches
>rejected, and ending up with nothing that's actually useful.
>
>But on the other hand, it's not like we have to design everything in a
>greenfield. Other database systems have provided in-core, user-facing
>features to control the planner for decades, and we can look at those
>offerings -- and existing offerings in the PG space -- as we try to
>judge whether a particular use case is totally insane. I am not here
>to argue that everything that every system has done is completely
>perfect and without notable flaws, but our own system has its own
>share of flaws, and the fact that you can do very little when a
>previously unproblematic query starts suddenly producing a bad plan is
>definitely one of them. I believe we are long past the point where we
>can simply hold our breath and pretend like there's no issue here. At
>the very least, allowing extensions to control scan methods (including
>choice of indexes), join methods, and join order (including which
>table ends up on which side of a given join) and similar things for
>aggregates and appendrels seems to me like it ought to be table
>stakes. And those extensions shouldn't have to duplicate large chunks
>of code or resort to gross hacks to do it. Eventually, maybe we'll
>even want to have directly user-facing features to do some of this
>stuff (in query hints, out of query hints, or whatever) but I think
>opening the door up to extensions doing it is a good first step,
>because (1) that allows different extensions to do different things
>without taking a position on what the One Right Thing To Do is and (2)
>if it becomes clear that something improvident has been done, it is a
>lot easier to back out a hook or some C API change than it is to
>back-out a user-visible feature. Or maybe we'll never want to expose a
>user-visible feature here, but it can still be useful to enable
>extensions.
>
>The attached patch, briefly mentioned above, essentially converts the
>enable_* GUCs into RelOptInfo properties where the defaults are set by
>the corresponding GUCs. The idea is that a hook could then change this
>on a per-RelOptInfo basis before path generation happens. For
>baserels, I believe that could be done from get_relation_info_hook for
>baserels, and we could introduce something similar for other kinds of
>rels. I don't think this is in any way the perfect approach. On the
>one hand, it doesn't give you all the kinds of control over path
>generation that you might want. On the other hand, the more I look at
>what our enable_* GUCs actually do, the less impressed I am. IMHO,
>things like enable_hashjoin make a lot of sense, but enable_sort seems
>like it just controls an absolutely random smattering of behaviors in
>a way that seems to me to have very little to recommend it, and I've
>complained elsewhere about how enable_indexscan and
>enable_indexonlyscan are really quite odd when you look at how they're
>implemented. Still, this seemed like a fairly easy thing to do as a
>way of demonstrating the kind of thing that we could do to provide
>extensions with more control over planner behavior, and I believe it
>would be concretely useful to pg_hint_plan in particular. But all that
>said, as much as anything, I want to get some feedback on what
>approaches and trade-offs people think might be acceptable here,
>because there's not much point in me spending a bunch of time writing
>code that everyone (or a critical mass of people) are going to hate.
>
>Thanks,
>
>--
>Robert Haas
>EDB: http://www.enterprisedb.com
>
>[1] https://github.com/ossc-db/pg_hint_plan
I really admire this idea. here is my confusion: Isn't the core of this idea whether to turn the planner into a framework? Personally, I think that under PostgreSQL's heap table storage, the optimizer might be better off focusing on optimizing the generation of execution plans. It’s possible that in some specific scenarios, developers might want to intervene in the generation of execution plans by extensions. I'm not sure if these scenarios usually occur when the storage structure is also extended by developers. If so, could existing solutions like "planner_hook" potentially solve the problem?