Re: Improving planner variable handling - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Improving planner variable handling
Date
Msg-id 4805CB12.6040402@enterprisedb.com
Whole thread Raw
In response to Improving planner variable handling  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Improving planner variable handling
List pgsql-hackers
I wonder if this would help to clean up the equivalence class hacks in 
Greg's ordered append patch?

Tom Lane wrote:
> I've been thinking about how to improve the planner's poor handling of
> variables in outer-join situations.  Here are some past examples for
> motivation:
> 
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00154.php
> http://archives.postgresql.org/pgsql-general/2008-03/msg01440.php
> 
> The reason why the planner acts so stupidly in these examples is that
> we're still using a kluge solution for this old bug:
> http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php
> 
> The root of the problem is that the planner does not worry about computing
> output expressions until the top of a plan tree.  All lower-level join
> nodes are made to output simple lists of Vars referencing columns of the
> base relations of the join.  We handle outer-join cases by forcing the
> values of the Vars of the nullable side to null at the level of the join,
> whenever there's no matching row in the nullable side.  If one of the base
> relations of the join is a sub-SELECT whose output list includes
> expressions that don't certainly go to null when the input variables are
> forced to null, then we can't flatten that sub-SELECT, because flattening
> the sub-SELECT means that the expression evaluations bubble to the top of
> the plan tree and can produce non-null results when they shouldn't
> (as happened in the above bug, before we realized that we had to prevent
> flattening in this case).
> 
> Another problem with this approach is that depending on what level of the
> plan tree you are thinking about, a Var nominally referencing tab.col
> might really mean the value of tab.col, or it might mean "either tab.col
> or NULL depending on what happened at some lower level of outer join".
> Since we can't readily tell the difference, we have estimation errors
> arising from failure to expect some NULLs (there have been recent
> complaints about this), and we need some pretty ugly kluges in places like
> EquivalenceClass processing to handle the risk that apparently identical
> expressions might not really be equal.
> 
> I think the basic solution for this is that upper levels of the plan tree
> should refer to the nullable output columns of an outer join using
> "alias Vars" that name the join rel, not the underlying base relation,
> even if there is a simple base-relation Var that the alias is tracking.
> In the case involving a sub-SELECT, the alias Var would stand for whatever
> output expression appears in the sub-SELECT.  We already have the concept
> of these alias Vars, in fact --- that's exactly the representation emitted
> by the parser.  But historically the planner has smashed aliases down to
> their base Vars as early as possible (see flatten_join_alias_vars).
> That has some advantages but I'm thinking it's outweighed by the
> disadvantages.  I'd like to try leaving alias Vars as aliases all the
> way through the planner, in any case where they might be semantically
> different from their referent (ie, whenever there's a possible
> force-to-null involved).
> 
> To make this work, we'd need to have the constructed plan tree compute the
> alias Var from its referent expression at the lowest outer-join that can
> null the alias Var.  The trick for the executor is to know when to force
> the value to null instead of computing the expression.  I first thought
> about marking entries of the join node's targetlist as to be forced to
> null if left or right input row is null.  However, that fails if we want
> the join node to compute some projection expressions on top of the raw
> join output (as would certainly happen if it were the top node of the
> tree, for example).  That could be handled by inserting another level of
> plan node (ie, a Result) to do the projection, but that seems a pretty
> ugly and inefficient solution.  What I have in mind instead is to insert a
> new kind of expression node "ForceToNull" atop the referent expression,
> with this node able to look at the EState (in the same way a regular Var
> node would) to see if it should return a null instead of computing its
> child expression.  Then expansion of an alias Var into a ForceToNull and
> the underlying expression would work.
> 
> I'm envisioning keeping track of active alias Vars and their expansions in
> a new list attached to the PlannerInfo "root" node.  This would provide a
> place to record important information like which level of the join tree
> such a Var needs to be evaluated at.
> 
> This is all pretty handwavy yet, but I don't think I'll be able to fill in
> many more details until I try to code it.  I thought I'd put up this
> summary to see if anyone can shoot holes in it at this level of detail ...
> Comments?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Problem with site doc search
Next
From: Simon Riggs
Date:
Subject: Re: SET TRANSACTION not compliant with SQL:2003