Thread: Re: allowing extensions to control planner behavior

Re: allowing extensions to control planner behavior

From
Andrei Lepikhov
Date:
On 26/8/2024 18:32, Robert Haas wrote:
> I'm somewhat expecting to be flamed to a well-done crisp for saying
> this, but I think we need better ways for extensions to control the
> behavior of PostgreSQL's query planner. I know of two major reasons
It is the change I have been waiting for a long time. Remember how many 
kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written 
for only the reason of a small number of hooks - I guess many other 
people could cheer such work.

> why somebody might want to do this. First, you might want to do
> something like what pg_hint_plan does, where it essentially implements
> Oracle-style hints that can be either inline or stored in a side table
> and automatically applied to queries.[1] In addition to supporting
> Oracle-style hints, it also supports some other kinds of hints so that
> you can, for example, try to fix broken cardinality estimates. Second,
My personal most wanted list:
- Selectivity list estimation hook
- Groups number estimation hook
- hooks on memory estimations, involving work_mem
- add_path() hook
- Hook on final RelOptInfo pathlist
- a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query 
structures
- Extensibility of extended and plain statistics
- Hook on portal error processing
- Canonicalise expressions hook

> you might want to convince the planner to keep producing the same kind
> of plan that it produced previously. I believe this is what Amazon's
> query plan management feature[2] does, although since it is closed
> source and I don't work at Amazon maybe it's actually implemented
> completely differently. Regardless of what Amazon did in this case,
> plan stability is a feature people want. Just trying to keep using the
> same plan data structure forever doesn't seem like a good strategy,
> because for example it would be fragile in the case of any DDL
> changes, like dropping and recreating an index, or dropping or adding
As a designer of plan freezing feature [1] I can say it utilises 
plancache and, being under its invalidation callbacks it doesn't afraid 
DDL or any other stuff altering database objects.

> Unfortunately, the part about the hook having the freedom to delete
> paths isn't really true. Perhaps technically you can delete a path
> that you don't want to be chosen, but any paths that were dominated by
> the path you deleted have already been thrown away and it's too late
> to get them back. You can modify paths if you don't want to change
> their costs, but if you change their costs then you have the same
> problem: the contents of the pathlist at the time that you see it are
> determined by the costs that each path had when it was initially
> added, and it's really too late to editorialize on that. So all you
> can really do here in practice is add new paths.
 From my standpoint, it is enough to export routines creating paths and 
calculating costs.

> set_join_pathlist_hook, which applies to joinrels, is similarly
> limited. appendrels don't even have an equivalent of this hook.
> 
> So, how could we do better?
> 
> I think there are two basic approaches that are possible here. If
> someone sees a third option, let me know. First, we could allow users
> to hook add_path() and add_partial_path(). That certainly provides the
> flexibility on paper to accept or reject whatever paths you do or do
+1

> The attached patch, briefly mentioned above, essentially converts the
> enable_* GUCs into RelOptInfo properties where the defaults are set by
> the corresponding GUCs. The idea is that a hook could then change this
> on a per-RelOptInfo basis before path generation happens. For
IMO, it is better not to switch on/off algorithms, but allow extensions 
to change their cost multipliers, modifying costs balance. 10E9 looks 
like a disable, but multiplier == 10 for a cost node just provide more 
freedom for hashing strategies.

[1] https://postgrespro.com/docs/enterprise/16/sr-plan

-- 
regards, Andrei Lepikhov




Re: allowing extensions to control planner behavior

From
Robert Haas
Date:
On Mon, Aug 26, 2024 at 2:00 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> It is the change I have been waiting for a long time. Remember how many
> kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written
> for only the reason of a small number of hooks - I guess many other
> people could cheer such work.

I think so, too. I know there are going to be people who hate this,
but I think the cat is already out of the bag. It's not a question any
more of whether it will happen, it's just a question of whether we
want to collaborate with extension developers or try to make their
life difficult.

> My personal most wanted list:
> - Selectivity list estimation hook
> - Groups number estimation hook
> - hooks on memory estimations, involving work_mem
> - add_path() hook
> - Hook on final RelOptInfo pathlist
> - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query
> structures
> - Extensibility of extended and plain statistics
> - Hook on portal error processing
> - Canonicalise expressions hook

One of my chronic complaints about hooks is that people propose hooks
that are just in any random spot in the code where they happen to want
to change something. If we accept that, we end up with a lot of hooks
where nobody can say how the hook can be used usefully and maybe it
can't actually be used usefully even by the original author, or only
them and nobody else. So these kinds of proposals need detailed,
case-by-case scrutiny. It's unacceptable for the planner to get filled
up with a bunch of poorly-designed hooks just as it is for any other
part of the system, but well-designed hooks whose usefulness can
clearly be seen should be just as welcome here as anywhere else.

> IMO, it is better not to switch on/off algorithms, but allow extensions
> to change their cost multipliers, modifying costs balance. 10E9 looks
> like a disable, but multiplier == 10 for a cost node just provide more
> freedom for hashing strategies.

That may be a valid use case, but I do not think it is a typical use
case. In my experience, when people want to force the planner to do
something, they really mean it. They don't mean "please do it this way
unless you really, really don't feel like it." They mean "please do it
this way, period." And that is also what other systems provide. Oracle
could provide a hint MERGE_COST(foo,10) meaning make merge joins look
ten times as expensive but in fact they only provide MERGE and
NO_MERGE. And a "reproduce this previous plan" feature really demands
infrastructure that truly forces the planner to do what it's told,
rather than just nicely suggesting that it might want to do as it's
told. I wouldn't be sad at all if we happen to end up with a system
that's powerful enough for an extension to implement "make merge joins
ten times as expensive"; in fact, I think that would be pretty cool.
But I don't think it should be the design center for what we
implement, because it looks nothing like what existing PG or non-PG
systems do, at least in my experience.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allowing extensions to control planner behavior

From
Andrei Lepikhov
Date:
On 26/8/2024 21:44, Robert Haas wrote:
> On Mon, Aug 26, 2024 at 2:00 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> My personal most wanted list:
>> - Selectivity list estimation hook
>> - Groups number estimation hook
>> - hooks on memory estimations, involving work_mem
>> - add_path() hook
>> - Hook on final RelOptInfo pathlist
>> - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query
>> structures
>> - Extensibility of extended and plain statistics
>> - Hook on portal error processing
>> - Canonicalise expressions hook
> 
> One of my chronic complaints about hooks is that people propose hooks
> that are just in any random spot in the code where they happen to want
> to change something. If we accept that, we end up with a lot of hooks
> where nobody can say how the hook can be used usefully and maybe it
> can't actually be used usefully even by the original author, or only
> them and nobody else. So these kinds of proposals need detailed,
> case-by-case scrutiny. It's unacceptable for the planner to get filled
> up with a bunch of poorly-designed hooks just as it is for any other
> part of the system, but well-designed hooks whose usefulness can
> clearly be seen should be just as welcome here as anywhere else.
Definitely so. Think about that as a sketch proposal on the roadmap. 
Right now, I know about only one hook - selectivity hook - which we 
already discussed and have Tomas Vondra's patch on the table. But even 
this is a big deal, because multi-clause estimations are a huge pain for 
users that can't be resolved with extensions for now without core patches.

>> IMO, it is better not to switch on/off algorithms, but allow extensions
>> to change their cost multipliers, modifying costs balance. 10E9 looks
>> like a disable, but multiplier == 10 for a cost node just provide more
>> freedom for hashing strategies.
> 
> That may be a valid use case, but I do not think it is a typical use
> case. In my experience, when people want to force the planner to do
> something, they really mean it. They don't mean "please do it this way
> unless you really, really don't feel like it." They mean "please do it
> this way, period." And that is also what other systems provide. Oracle
> could provide a hint MERGE_COST(foo,10) meaning make merge joins look
> ten times as expensive but in fact they only provide MERGE and
> NO_MERGE. And a "reproduce this previous plan" feature really demands
> infrastructure that truly forces the planner to do what it's told,
> rather than just nicely suggesting that it might want to do as it's
> told. I wouldn't be sad at all if we happen to end up with a system
> that's powerful enough for an extension to implement "make merge joins
> ten times as expensive"; in fact, I think that would be pretty cool.
> But I don't think it should be the design center for what we
> implement, because it looks nothing like what existing PG or non-PG
> systems do, at least in my experience.
Heh, I meant not manual usage, but automatical one, provided by extensions.

-- 
regards, Andrei Lepikhov