Thread: Re: allowing extensions to control planner behavior

Re: allowing extensions to control planner behavior

From
Jeff Davis
Date:
On Mon, 2024-08-26 at 12:32 -0400, Robert Haas wrote:
> I think there are two basic approaches that are possible here. If
> someone sees a third option, let me know. First, we could allow users
> to hook add_path() and add_partial_path().

...

> The other possible approach is to allow extensions to feed some
> information into the planner before path generation and let that
> influence which paths are generated.

Preserving a path for the right amount of time seems like the primary
challenge for most of the use cases you raised (removing paths is
easier than resurrecting one that was pruned too early). If we try to
keep a path around, that implies that we need to keep parent paths
around too, which leads to an explosion if we aren't careful.

But we already solved all of that for pathkeys. We keep the paths
around if there's a reason to (a useful pathkey) and there's not some
other cheaper path that also satisfies the same reason.

Idea: generalize the idea of "pathkeys" to work for other reasons to
preserve a path.

Mechanically, a hint to use an index could work very similarly: come up
with a custom reason to keep a path around, such as "a hint suggests we
use index foo_idx for table foo", and assign it a unique number. If
there's another hint that says we should also use index bar_idx for
table bar, then that reason would get a different unique reason number.
(In other words, the number of reasons would not be fixed; there could
be one reason for each hint specified in the query, kind of like there
could be many interesting pathkeys for a query.)

Each Path would have a "preserve_for_these_reasons" bitmapset holding
all of the non-cost reasons we are preserving that path. If two paths
have exactly the same set of reasons, then add_path() would only keep
the cheaper one.

We could get fancy and have a compare_reasons_hook that would allow you
to take two paths with the same reason and see if there are other
factors to consider that would cause both to still be preserved
(similar to pathkey length).

I suspect that we might see interesting applications of this mechanism
in core as well: for instance, track partition keys or other properties
relevant to parallelism. That could allow us to keep parallel-friendly
paths around and then decide later in the planning process whether to
actually parallelize them or not.


Once we've generalized the "reasons" mechnism, it would be easy enough
to have a hook to add reasons to a path as it's being generated to be
sure it's not lost. These hooks should probably be called in the
individual create_*_path() functions where there's enough context to
know what's happening. There could be many such hooks, but I suspect
only a handful of important ones.

This idea allows the extension author to preserve the right paths long
enough to use set_rel_pathlist_hook/set_join_pathlist_hook, which can
editorialize on costs or do its own pruning.

Regards,
    Jeff Davis




Re: allowing extensions to control planner behavior

From
Robert Haas
Date:
On Wed, Aug 28, 2024 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Preserving a path for the right amount of time seems like the primary
> challenge for most of the use cases you raised (removing paths is
> easier than resurrecting one that was pruned too early). If we try to
> keep a path around, that implies that we need to keep parent paths
> around too, which leads to an explosion if we aren't careful.
>
> But we already solved all of that for pathkeys. We keep the paths
> around if there's a reason to (a useful pathkey) and there's not some
> other cheaper path that also satisfies the same reason.

But we've already solved it for this case, too. This is exactly what
incrementing disabled_nodes does. This very recently replaced what we
did previously, which was adding disable_cost to the cost of every
path. Either way, you just need a hook that lets you disable the paths
that you don't prefer. Once you do that, add_path() takes care of the
rest: disabled paths lose to non-disabled paths, and disabled paths
lose to more expensive disabled paths.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allowing extensions to control planner behavior

From
Jeff Davis
Date:
On Wed, 2024-08-28 at 16:35 -0400, Robert Haas wrote:
> On Wed, Aug 28, 2024 at 4:29 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > Preserving a path for the right amount of time seems like the
> > primary
> > challenge for most of the use cases you raised (removing paths is
> > easier than resurrecting one that was pruned too early). If we try
> > to
> > keep a path around, that implies that we need to keep parent paths
> > around too, which leads to an explosion if we aren't careful.
> >
> > But we already solved all of that for pathkeys. We keep the paths
> > around if there's a reason to (a useful pathkey) and there's not
> > some
> > other cheaper path that also satisfies the same reason.
>
> But we've already solved it for this case, too. This is exactly what
> incrementing disabled_nodes does.

Hints are often described as something positive: use this index, use a
hash join here, etc. Trying to force a positive thing by adding
negative attributes to everything else is awkward. We've all had the
experience where we disable one plan type hoping for a good plan, and
we end up getting a different crazy plan that we didn't expect, and
need to disable a few more plan types.

Beyond awkwardness, one case where it matters is the interaction
between an extension that provides hints and an extension that offers a
CustomScan. How is the hints extension supposed to disable a path it
doesn't know about?

Regards,
    Jeff Davis




Re: allowing extensions to control planner behavior

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Beyond awkwardness, one case where it matters is the interaction
> between an extension that provides hints and an extension that offers a
> CustomScan. How is the hints extension supposed to disable a path it
> doesn't know about?

This does not seem remarkably problematic to me, given Robert's
proposal of a bitmask of allowed plan types per RelOptInfo.
You just do something like

    rel->allowed_plan_types = DESIRED_PLAN_TYPE;

The names of the bits you aren't setting are irrelevant to you.

            regards, tom lane



Re: allowing extensions to control planner behavior

From
Jeff Davis
Date:
On Wed, 2024-08-28 at 19:25 -0400, Tom Lane wrote:
> This does not seem remarkably problematic to me, given Robert's
> proposal of a bitmask of allowed plan types per RelOptInfo.
> You just do something like
>
>         rel->allowed_plan_types = DESIRED_PLAN_TYPE;
>
> The names of the bits you aren't setting are irrelevant to you.

I don't see that in the code yet, so I assume you are referring to the
comment at [1]?

I still like my idea to generalize the pathkey infrastructure, and
Robert asked for other approaches to consider. It would allow us to
hold onto multiple paths for longer, similar to pathkeys, which might
offer some benefits or simplifications.

Regards,
    Jeff Davis

[1]
https://www.postgresql.org/message-id/CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwBTNpAj71A@mail.gmail.com




Re: allowing extensions to control planner behavior

From
Robert Haas
Date:
On Thu, Aug 29, 2024 at 6:49 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I don't see that in the code yet, so I assume you are referring to the
> comment at [1]?

FYI, I'm hacking on a revised approach but it's not ready to show to
other people yet.

> I still like my idea to generalize the pathkey infrastructure, and
> Robert asked for other approaches to consider. It would allow us to
> hold onto multiple paths for longer, similar to pathkeys, which might
> offer some benefits or simplifications.

This is a fair point. I dislike the fact that add_path() is a thicket
of if-statements that's actually quite hard to understand and easy to
screw up when you're making modifications. But I feel like it would be
difficult to generalize the infrastructure without making it
substantially slower, which would probably cause too much of an
increase in planning time to be acceptable. So my guess is that this
is a dead end, unless there's a clever idea that I'm not seeing.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allowing extensions to control planner behavior

From
Jeff Davis
Date:
On Fri, 2024-08-30 at 07:33 -0400, Robert Haas wrote:
> This is a fair point. I dislike the fact that add_path() is a thicket
> of if-statements that's actually quite hard to understand and easy to
> screw up when you're making modifications. But I feel like it would
> be
> difficult to generalize the infrastructure without making it
> substantially slower, which would probably cause too much of an
> increase in planning time to be acceptable. So my guess is that this
> is a dead end, unless there's a clever idea that I'm not seeing.

As far as performance goes, I'm only looking at branch in add_path()
that calls compare_pathkeys(). Do you have some example queries which
would be a worst case for that path?

In general if you can post some details about how you are measuring,
that would be helpful.

Regards,
    Jeff Davis




Re: allowing extensions to control planner behavior

From
Robert Haas
Date:
On Fri, Aug 30, 2024 at 1:42 PM Jeff Davis <pgsql@j-davis.com> wrote:
> As far as performance goes, I'm only looking at branch in add_path()
> that calls compare_pathkeys(). Do you have some example queries which
> would be a worst case for that path?

I think we must be talking past each other somehow. It seems to me
that your scheme would require replacing that branch with something
more complicated or generalized. If it doesn't, then I don't
understand the proposal. If it does, then that seems like it could be
a problem.

> In general if you can post some details about how you are measuring,
> that would be helpful.

I'm not really measuring anything at this point, just recalling the
many previous times when add_path() has been discussed as a pain
point.

--
Robert Haas
EDB: http://www.enterprisedb.com