Re: Improving planner variable handling - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Improving planner variable handling |
Date | |
Msg-id | 4805CB12.6040402@enterprisedb.com Whole thread Raw |
In response to | Improving planner variable handling (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Improving planner variable handling
|
List | pgsql-hackers |
I wonder if this would help to clean up the equivalence class hacks in Greg's ordered append patch? Tom Lane wrote: > I've been thinking about how to improve the planner's poor handling of > variables in outer-join situations. Here are some past examples for > motivation: > > http://archives.postgresql.org/pgsql-hackers/2006-02/msg00154.php > http://archives.postgresql.org/pgsql-general/2008-03/msg01440.php > > The reason why the planner acts so stupidly in these examples is that > we're still using a kluge solution for this old bug: > http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php > > The root of the problem is that the planner does not worry about computing > output expressions until the top of a plan tree. All lower-level join > nodes are made to output simple lists of Vars referencing columns of the > base relations of the join. We handle outer-join cases by forcing the > values of the Vars of the nullable side to null at the level of the join, > whenever there's no matching row in the nullable side. If one of the base > relations of the join is a sub-SELECT whose output list includes > expressions that don't certainly go to null when the input variables are > forced to null, then we can't flatten that sub-SELECT, because flattening > the sub-SELECT means that the expression evaluations bubble to the top of > the plan tree and can produce non-null results when they shouldn't > (as happened in the above bug, before we realized that we had to prevent > flattening in this case). > > Another problem with this approach is that depending on what level of the > plan tree you are thinking about, a Var nominally referencing tab.col > might really mean the value of tab.col, or it might mean "either tab.col > or NULL depending on what happened at some lower level of outer join". > Since we can't readily tell the difference, we have estimation errors > arising from failure to expect some NULLs (there have been recent > complaints about this), and we need some pretty ugly kluges in places like > EquivalenceClass processing to handle the risk that apparently identical > expressions might not really be equal. > > I think the basic solution for this is that upper levels of the plan tree > should refer to the nullable output columns of an outer join using > "alias Vars" that name the join rel, not the underlying base relation, > even if there is a simple base-relation Var that the alias is tracking. > In the case involving a sub-SELECT, the alias Var would stand for whatever > output expression appears in the sub-SELECT. We already have the concept > of these alias Vars, in fact --- that's exactly the representation emitted > by the parser. But historically the planner has smashed aliases down to > their base Vars as early as possible (see flatten_join_alias_vars). > That has some advantages but I'm thinking it's outweighed by the > disadvantages. I'd like to try leaving alias Vars as aliases all the > way through the planner, in any case where they might be semantically > different from their referent (ie, whenever there's a possible > force-to-null involved). > > To make this work, we'd need to have the constructed plan tree compute the > alias Var from its referent expression at the lowest outer-join that can > null the alias Var. The trick for the executor is to know when to force > the value to null instead of computing the expression. I first thought > about marking entries of the join node's targetlist as to be forced to > null if left or right input row is null. However, that fails if we want > the join node to compute some projection expressions on top of the raw > join output (as would certainly happen if it were the top node of the > tree, for example). That could be handled by inserting another level of > plan node (ie, a Result) to do the projection, but that seems a pretty > ugly and inefficient solution. What I have in mind instead is to insert a > new kind of expression node "ForceToNull" atop the referent expression, > with this node able to look at the EState (in the same way a regular Var > node would) to see if it should return a null instead of computing its > child expression. Then expansion of an alias Var into a ForceToNull and > the underlying expression would work. > > I'm envisioning keeping track of active alias Vars and their expansions in > a new list attached to the PlannerInfo "root" node. This would provide a > place to record important information like which level of the join tree > such a Var needs to be evaluated at. > > This is all pretty handwavy yet, but I don't think I'll be able to fill in > many more details until I try to code it. I thought I'd put up this > summary to see if anyone can shoot holes in it at this level of detail ... > Comments? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: