David Rowley <david.rowley@2ndquadrant.com> wrote:
> On Wed, 6 Mar 2019 at 12:54, David Rowley <david.rowley@2ndquadrant.com> wrote:
> > The latest patch is attached.
>
> Rebased version after pgindent run.
I've spent some time looking into this.
One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
not necessarily at the top of the join tree. Consider this example:
CREATE TABLE a(i int);
CREATE TABLE b(j int);
CREATE TABLE c(k int NOT NULL);
CREATE TABLE d(l int);
SELECT *
FROM
a
JOIN b ON b.j NOT IN
( SELECT
c.k
FROM
c)
JOIN d ON b.j = d.l;
Here the b.j=d.l condition makes the planner think that the "b.j NOT IN
(SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not
true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is
joined to the other tables, so the NULL values of b.j are not filtered out
early enough.
I thought it would help if find_innerjoined_rels(), when called from
expressions_are_not_nullable(), only collected rels (and quals) from the
subtree below the sublink, but that does not seem to help:
CREATE TABLE e(m int);
SELECT *
FROM
a
JOIN e ON a.i = e.m
JOIN b ON a.i NOT IN
( SELECT
c.k
FROM
c)
JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0);
Here it might seem that the a.i=e.m condition eliminates NULL values from the
ANTI JOIN input, but it's probably hard to prove at query preparation time
that
(((a JOIN e) JOIN b) ANTI JOIN c) JOIN d
won't eventually be optimized to
(((a JOIN d) JOIN b) ANTI JOIN c) JOIN e
Since the join condition between "a" and "d" is not strict in this case, the
ANTI JOIN will receive the NULL values of a.i.
It seems tricky, I've got no idea of an alternative approach right now.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com