Thread: Re: allowing extensions to control planner behavior
On Mon, 2024-08-26 at 12:32 -0400, Robert Haas wrote: > I think there are two basic approaches that are possible here. If > someone sees a third option, let me know. First, we could allow users > to hook add_path() and add_partial_path(). ... > The other possible approach is to allow extensions to feed some > information into the planner before path generation and let that > influence which paths are generated. Preserving a path for the right amount of time seems like the primary challenge for most of the use cases you raised (removing paths is easier than resurrecting one that was pruned too early). If we try to keep a path around, that implies that we need to keep parent paths around too, which leads to an explosion if we aren't careful. But we already solved all of that for pathkeys. We keep the paths around if there's a reason to (a useful pathkey) and there's not some other cheaper path that also satisfies the same reason. Idea: generalize the idea of "pathkeys" to work for other reasons to preserve a path. Mechanically, a hint to use an index could work very similarly: come up with a custom reason to keep a path around, such as "a hint suggests we use index foo_idx for table foo", and assign it a unique number. If there's another hint that says we should also use index bar_idx for table bar, then that reason would get a different unique reason number. (In other words, the number of reasons would not be fixed; there could be one reason for each hint specified in the query, kind of like there could be many interesting pathkeys for a query.) Each Path would have a "preserve_for_these_reasons" bitmapset holding all of the non-cost reasons we are preserving that path. If two paths have exactly the same set of reasons, then add_path() would only keep the cheaper one. We could get fancy and have a compare_reasons_hook that would allow you to take two paths with the same reason and see if there are other factors to consider that would cause both to still be preserved (similar to pathkey length). I suspect that we might see interesting applications of this mechanism in core as well: for instance, track partition keys or other properties relevant to parallelism. That could allow us to keep parallel-friendly paths around and then decide later in the planning process whether to actually parallelize them or not. Once we've generalized the "reasons" mechnism, it would be easy enough to have a hook to add reasons to a path as it's being generated to be sure it's not lost. These hooks should probably be called in the individual create_*_path() functions where there's enough context to know what's happening. There could be many such hooks, but I suspect only a handful of important ones. This idea allows the extension author to preserve the right paths long enough to use set_rel_pathlist_hook/set_join_pathlist_hook, which can editorialize on costs or do its own pruning. Regards, Jeff Davis
On Wed, Aug 28, 2024 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote: > Preserving a path for the right amount of time seems like the primary > challenge for most of the use cases you raised (removing paths is > easier than resurrecting one that was pruned too early). If we try to > keep a path around, that implies that we need to keep parent paths > around too, which leads to an explosion if we aren't careful. > > But we already solved all of that for pathkeys. We keep the paths > around if there's a reason to (a useful pathkey) and there's not some > other cheaper path that also satisfies the same reason. But we've already solved it for this case, too. This is exactly what incrementing disabled_nodes does. This very recently replaced what we did previously, which was adding disable_cost to the cost of every path. Either way, you just need a hook that lets you disable the paths that you don't prefer. Once you do that, add_path() takes care of the rest: disabled paths lose to non-disabled paths, and disabled paths lose to more expensive disabled paths. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, 2024-08-28 at 16:35 -0400, Robert Haas wrote: > On Wed, Aug 28, 2024 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote: > > Preserving a path for the right amount of time seems like the > > primary > > challenge for most of the use cases you raised (removing paths is > > easier than resurrecting one that was pruned too early). If we try > > to > > keep a path around, that implies that we need to keep parent paths > > around too, which leads to an explosion if we aren't careful. > > > > But we already solved all of that for pathkeys. We keep the paths > > around if there's a reason to (a useful pathkey) and there's not > > some > > other cheaper path that also satisfies the same reason. > > But we've already solved it for this case, too. This is exactly what > incrementing disabled_nodes does. Hints are often described as something positive: use this index, use a hash join here, etc. Trying to force a positive thing by adding negative attributes to everything else is awkward. We've all had the experience where we disable one plan type hoping for a good plan, and we end up getting a different crazy plan that we didn't expect, and need to disable a few more plan types. Beyond awkwardness, one case where it matters is the interaction between an extension that provides hints and an extension that offers a CustomScan. How is the hints extension supposed to disable a path it doesn't know about? Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Beyond awkwardness, one case where it matters is the interaction > between an extension that provides hints and an extension that offers a > CustomScan. How is the hints extension supposed to disable a path it > doesn't know about? This does not seem remarkably problematic to me, given Robert's proposal of a bitmask of allowed plan types per RelOptInfo. You just do something like rel->allowed_plan_types = DESIRED_PLAN_TYPE; The names of the bits you aren't setting are irrelevant to you. regards, tom lane
On Wed, 2024-08-28 at 19:25 -0400, Tom Lane wrote: > This does not seem remarkably problematic to me, given Robert's > proposal of a bitmask of allowed plan types per RelOptInfo. > You just do something like > > rel->allowed_plan_types = DESIRED_PLAN_TYPE; > > The names of the bits you aren't setting are irrelevant to you. I don't see that in the code yet, so I assume you are referring to the comment at [1]? I still like my idea to generalize the pathkey infrastructure, and Robert asked for other approaches to consider. It would allow us to hold onto multiple paths for longer, similar to pathkeys, which might offer some benefits or simplifications. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwBTNpAj71A@mail.gmail.com
On Thu, Aug 29, 2024 at 6:49 PM Jeff Davis <pgsql@j-davis.com> wrote: > I don't see that in the code yet, so I assume you are referring to the > comment at [1]? FYI, I'm hacking on a revised approach but it's not ready to show to other people yet. > I still like my idea to generalize the pathkey infrastructure, and > Robert asked for other approaches to consider. It would allow us to > hold onto multiple paths for longer, similar to pathkeys, which might > offer some benefits or simplifications. This is a fair point. I dislike the fact that add_path() is a thicket of if-statements that's actually quite hard to understand and easy to screw up when you're making modifications. But I feel like it would be difficult to generalize the infrastructure without making it substantially slower, which would probably cause too much of an increase in planning time to be acceptable. So my guess is that this is a dead end, unless there's a clever idea that I'm not seeing. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, 2024-08-30 at 07:33 -0400, Robert Haas wrote: > This is a fair point. I dislike the fact that add_path() is a thicket > of if-statements that's actually quite hard to understand and easy to > screw up when you're making modifications. But I feel like it would > be > difficult to generalize the infrastructure without making it > substantially slower, which would probably cause too much of an > increase in planning time to be acceptable. So my guess is that this > is a dead end, unless there's a clever idea that I'm not seeing. As far as performance goes, I'm only looking at branch in add_path() that calls compare_pathkeys(). Do you have some example queries which would be a worst case for that path? In general if you can post some details about how you are measuring, that would be helpful. Regards, Jeff Davis
On Fri, Aug 30, 2024 at 1:42 PM Jeff Davis <pgsql@j-davis.com> wrote: > As far as performance goes, I'm only looking at branch in add_path() > that calls compare_pathkeys(). Do you have some example queries which > would be a worst case for that path? I think we must be talking past each other somehow. It seems to me that your scheme would require replacing that branch with something more complicated or generalized. If it doesn't, then I don't understand the proposal. If it does, then that seems like it could be a problem. > In general if you can post some details about how you are measuring, > that would be helpful. I'm not really measuring anything at this point, just recalling the many previous times when add_path() has been discussed as a pain point. -- Robert Haas EDB: http://www.enterprisedb.com