Re: allowing extensions to control planner behavior - Mailing list pgsql-hackers

From Robert Haas
Subject Re: allowing extensions to control planner behavior
Date
Msg-id CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwBTNpAj71A@mail.gmail.com
Whole thread Raw
In response to Re: allowing extensions to control planner behavior  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Aug 26, 2024 at 1:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> For example, I don't see
> how this gets us any closer to letting an extension fix a poor choice
> of join order.

Thinking more about this particular sub-problem, let's say we're
joining four tables A, B, C, and D. An extension wants to compel join
order A-B-C-D. Let's suppose, however, that it wants to do this in a
way where planning won't fail if that's impossible, so it wants to use
disabled_nodes rather than skipping path generation entirely.

When we're planning the baserels, we don't need to do anything
special. When we plan 2-way joins, we need to mark all paths disabled
except those originating from the A-B join. When we plan 3-way joins,
we need to mark all paths disabled except those originating from an
(A-B)-C join. When we plan the final 4-way join, we don't really need
to do anything extra: the only way to end up with a non-disabled path
at the top level is to pick a path from the (A-B)-C join and a path
from D.

There's a bit of nuance here, though. Suppose that when planning the
A-B join, the planner generates HashJoin(SeqScan(B),Hash(A)). Does
that path need to be disabled? If you think that join order A-B-C-D
means that table A should be the driving table, then the answer is
yes, because that path will lead to a join order beginning with B-A,
not one beginning with A-B. But you might also have a mental model
where it doesn't matter which side of the table is on which side of
the join, and as long as you start by joining A and B in some way,
that's good enough to qualify as an A-B join order. I believe actual
implementations vary in which approach they take.

I think that the beginning of add_paths_to_joinrel() looks like a
useful spot to get control. You could, for example, have a hook there
which returns a bitmask indicating which of merge-join, nested-loop,
and hash join will be allowable for this call; that hook would then
allow for control over the join method and the join order, and the
join order control is strong enough that you can implement either of
the two interpretations above. This idea theorizes that 0001 was wrong
to make the path mask a per-RelOptInfo value, because there could be
many calls to add_paths_to_joinrel() for a single RelOptInfo and, in
this idea, every one of those can enforce a different mask.

Potentially, such a hook could return additional information, either
by using more bits of the bitmask or by returning other information
via some other data type. For instance, I still believe that
distinguishing between parameterized-nestloops and
unparameterized-nestloops would be real darn useful, so we could have
separate bits for each; or you could have a bit to control whether
foreign-join paths get disabled (or are considered at all), or you
could have separate bits for merge joins that involve 0, 1, or 2
sorts. Whether we need or want any or all of that is certainly
debatable, but the point is that if you did want some of that, or
something else, it doesn't look difficult to feed that information
through to the places where you would need it to be available.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Add contrib/pg_logicalsnapinspect
Next
From: Michael Banck
Date:
Subject: Re: allowing extensions to control planner behavior