Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno
Date
Msg-id 25142.1580847861@sss.pgh.pa.us
Whole thread Raw
In response to Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Dec 20, 2019 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The alternatives that seem plausible at this point are
>> (1) Create some sort of wrapper node indicating "the contents of this
>> expression might be replaced by NULL".  This is basically what the
>> planner's PlaceHolderVars do, so maybe we'd just be talking about
>> introducing those at some earlier stage.
>> ...

> I'm not sure which is better, either, although I would like to note in
> passing that the name PlaceHolderVar seems to me to be confusing and
> terrible. It took me years to understand it, and I've never been
> totally sure that I actually do. Why is it not called
> MightBeNullWrapper or something?

Here's a data dump about my further thoughts in this area.  I've concluded
that the "wrapper" approach is the right way to proceed, and that rather
than having the planner introduce the wrappers as happens now, we should
indeed have the parser introduce the wrappers from the get-go.  There are
a few arguments for that:

* Arguably, this is a question of decorating the parse tree with
information about query semantics.  I've always held that parse analysis
is what should introduce knowledge of semantics; the planner ought not be
reverse-engineering that.

* AFAICS, we would need an additional pass over the query tree in order to
do this in the planner.  There is no existing recursive tree-modification
pass that happens at an appropriate time.

* We can use the same type of wrapper node to solve the problems with
grouping-set expressions that were discussed in
https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6-ef89-4958-da212fe10176%40iki.fi
although I'd leave that for a follow-on patch rather than try to fix
it immediately.  Here again, it'd be better to introduce the wrappers
at parse time --- check_ungrouped_columns() is already detecting the
presence of grouping-expression references, so we could make it inject
wrappers around them at relatively little extra cost.

Per Robert's complaint above, these wrappers need better documentation,
and they should be called something other than PlaceHolderVar, even though
they're basically that (and hopefully will replace those completely).
I'm tentatively thinking of calling them "NullableVar", but am open to
better ideas.  And here is a proposed addition to optimizer/README to
explain why they exist.  I'm not quite satisfied with the explanation yet
--- in particular, if we don't need them at runtime, why do we need them
at parse time?  Any thoughts about how to explain this more solidly are
welcome.

----------

To simplify handling of some issues with outer joins, we use NullableVars,
which are produced by the parser and used by the planner, but do not
appear in finished plans.  A NullableVar is a wrapper around another
expression, decorated with a set of outer-join relids, and notionally
having the semantics

    CASE WHEN any-of-these-outer-joins-produced-a-null-extended-row
    THEN NULL
    ELSE contained-expression
    END

It's only notional, because no such calculation is ever done explicitly.
In a finished plan, the NullableVar construct is replaced by a plain Var
referencing an output column of the topmost mentioned outer join, while
the "contained expression" is the corresponding input to the bottommost
join.  Any forcing to null happens in the course of calculating the
outer join results.  (Because we don't ever have to do the calculation
explicitly, it's not necessary to distinguish which side of an outer join
got null-extended, which'd otherwise be essential information for FULL
JOIN cases.)

A NullableVar wrapper is placed around a Var referencing a column of the
nullable side of an outer join when that reference appears syntactically
above (outside) the outer join, but not when the reference is below the
outer join, such as within its ON clause.  References to the non-nullable
side of an outer join are never wrapped.  NullableVars mentioning multiple
join nodes arise from cases with nested outer joins.

It might seem that the NullableVar construct is unnecessary (and indeed,
we got by without it for many years).  In a join row that's null-extended
for lack of a matching nullable-side row, the only reasonable value to
impute to a Var of that side is NULL, no matter where you look in the
parse tree.  However there are pressing reasons to use NullableVars
anyway:

* NullableVars simplify reasoning about where to evaluate qual clauses.
Consider
    SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
(Assume foo() is not strict, so that we can't reduce the left join to
a plain join.)  A naive implementation might try to push the foo(t2.z)
call down to the scan of t2, but that is not correct because (a) what
foo() should actually see for a null-extended join row is NULL, and
(b) if foo() returns false, we should suppress the t1 row from the join
altogether, not emit it with a null-extended t2 row.  On the other hand,
it *would* be correct (and desirable) to push the call down if the query
were
    SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y AND foo(t2.z))
If the upper WHERE clause is represented as foo(NullableVar(t2.z)), then
we can recognize that the NullableVar construct must be evaluated above
the join, since it references the join's relid.  Meanwhile, a t2.z
reference within the ON clause receives no such decoration, so in the
second case foo(t2.z) can be seen to be safe to push down to the scan
level.  Thus we can solve the qual-placement problem in a simple and
general fashion.

* NullableVars simplify reasoning around EquivalenceClasses.  Given say
    SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE t1.x = 42
we would like to put t1.x and t2.y and 42 into the same EquivalenceClass
and then derive "t2.y = 42" to use as a restriction clause for the scan
of t2.  However, it'd be wrong to conclude that t2.y will always have
the value 42, or that it's equal to t1.x in every joined row.  The use
of NullableVar wrappers sidesteps this problem: we can put t2.y in the
EquivalenceClass, and we can derive all the equalities we want about it,
but they will not lead to conclusions that NullableVar(t2.y) is equal to
anything.

* NullableVars are necessary to avoid wrong results when flattening
sub-selects.  If t2 in the above example is a sub-select or view in which
the y output column is a constant, and we want to pull up that sub-select,
we cannot simply substitute that constant for every use of t2.y in the
outer query: a Const node will not produce "NULL" when that's needed.
But it does work if the t2.y Vars are wrapped in NullableVars.  The
NullableVar shows that the contained value might be replaced by a NULL,
and it carries enough information so that we can generate a plan tree in
which that replacement does happen when necessary (by evaluating the
Const below the outer join and making upper references to it be Vars).
Moreover, when pulling up the constant into portions of the parse tree
that are below the outer join, the right things also happen: those
references can validly become plain Consts.

In essence, these examples show that it's useful to treat references to
a column of the nullable side of an outer join as being semantically
distinct depending on whether they are "above" or "below" the outer join,
even though no distinction exists once the calculation of a particular
join output row is complete.

----------

As you might gather from that, I'm thinking of changing the planner
so that (at least for outer joins) the relid set for a join includes
the RTE number of the join node itself.  I haven't decided yet if
that should happen across-the-board or just in the areas where we
use relid sets to decide which qual expressions get evaluated where.

Some other exciting things that will happen:

* RestrictInfo.is_pushed_down will go away; as sketched above, the
presence of the outer join's relid in the qual's required_relids
(due to NullableVars' outer join relid sets getting added into that
by pull_varnos) will tell us whether the qual must be treated as
a join or filter qual for the current join level.

* I think a lot of hackery in distribute_qual_to_rels can go away,
such as the below_outer_join flag, and maybe check_outerjoin_delay.
All of that is basically trying to reverse-engineer the qual
placement semantics that the wrappers will make explicit.

* As sketched above, equivalence classes will no longer need to
treat outer-join equalities with suspicion, and I think the
reconsider_outer_join_clauses stuff goes away too.

* There's probably other hackery that can be simplified; I've not
gone looking in detail yet.

I've not written any actual code, but am close to being ready to.
One thing I'm still struggling with is how to continue to support
outer join "identity 3":

        3.      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
                = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

        Identity 3 only holds if predicate Pbc must fail for all-null B
        rows (that is, Pbc is strict for at least one column of B).

Per this sketch, if the query is initially written the first way, Pbc's
references to B Vars would have NullableVars indicating a dependence on
the A/B join, seemingly preventing Pbc from being pushed into the RHS of
that join per the identity.  But if the query is initially written the
second way, there will be no NullableVar wrappers in either predicate.
Maybe it's sufficient to strip the NullableVar wrappers once we've
detected the applicability of the identity.  (We'll need code for that
anyway, since outer-join strength reduction will create cases where
NullableVar wrappers need to go away.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan
Next
From: Floris Van Nee
Date:
Subject: RE: Index Skip Scan