Re: LATERAL quals revisited - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: LATERAL quals revisited |
Date | |
Msg-id | CAFjFpRenMeu4V_xGxyWD2PsQj+Tk2d0qGAkqXJJkT7qZac1Mmw@mail.gmail.com Whole thread Raw |
In response to | LATERAL quals revisited (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: LATERAL quals revisited
|
List | pgsql-hackers |
I have couple of questions.
--
On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've been studying the bug reported at
http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
that the planner can do the wrong thing with queries like
SELECT * FROM
i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
I think the fundamental problem is that, because the "i.n = j.n" clause
appears syntactically in WHERE, the planner is treating it as if it were
an inner-join clause; but really it ought to be considered a clause of
the upper LEFT JOIN. That is, semantically this query ought to be
equivalent to
SELECT * FROM
i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;
However, because distribute_qual_to_rels doesn't see the clause as being
attached to the outer join, it's not marked with the correct properties
and ends up getting evaluated in the wrong place (as a "filter" clause
not a "join filter" clause). The bug is masked in the test cases we've
used so far because those cases are designed to let the clause get
pushed down into the scan of the inner relation --- but if it doesn't
get pushed down, it's evaluated the wrong way.
After some contemplation, I think that the most practical way to fix
this is for deconstruct_recurse and distribute_qual_to_rels to
effectively move such a qual to the place where it logically belongs;
that is, rather than processing it when we look at the lower WHERE
clause, set it aside for a moment and then add it back when looking at
the ON clause of the appropriate outer join. This should be reasonably
easy to do by keeping a list of "postponed lateral clauses" while we're
scanning the join tree.
For there to *be* a unique "appropriate outer join", we need to require
that a LATERAL-using qual clause that's under an outer join contain
lateral references only to the outer side of the nearest enclosing outer
join. There's no such restriction in the spec of course, but we can
make it so by refusing to flatten a sub-select if pulling it up would
result in having a clause in the outer query that violates this rule.
There's already some code in prepjointree.c (around line 1300) that
attempts to enforce this, though now that I look at it again I'm not
sure it's covering all the bases. We may need to extend that check.
Why do we need this restriction? Wouldn't a place (specifically join qual at such a place) in join tree where all the participating relations are present, serve as a place where the clause can be applied. E.g. in the query
select * from tab1 left join tab2 t2 using (val) left join lateral (select val from tab2 where val2 = tab1.val * t2.val) t3 using (val);
select * from tab1 left join tab2 t2 using (val) left join lateral (select val from tab2 where val2 = tab1.val * t2.val) t3 using (val);
Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a place where we are computing join between tab1, t2 and t3?
I'm inclined to process all LATERAL-using qual clauses this way, ie
postpone them till we recurse back up to a place where they can
logically be evaluated. That won't make any real difference when no
outer joins are present, but it will eliminate the ugliness that right
now distribute_qual_to_rels is prevented from sanity-checking the scope
of the references in a qual when LATERAL is present. If we do it like
this, we can resurrect full enforcement of that sanity check, and then
throw an error if any "postponed" quals are left over when we're done
recursing.
Parameterized nested loop join would always be able to evaluate a LATERAL query. Instead of throwing error, why can't we choose that as the default strategy whenever we fail to flatten subquery?
Can we put the clause with lateral references at its appropriate place while flattening the subquery? IMO, that will be cleaner and lesser work than first pulling the clause and then putting it back again? Right, now, we do not have that capability in pull_up_subqueries() but given its recursive structure, it might be easier to do it there.
Thoughts, better ideas?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
pgsql-hackers by date: