Re: Converting NOT IN to anti-joins during planning - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Converting NOT IN to anti-joins during planning
Date
Msg-id 2759.1558966949@localhost
Whole thread Raw
In response to Re: Converting NOT IN to anti-joins during planning  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
> not necessarily at the top of the join tree. Consider this example:
>
> CREATE TABLE a(i int);
> CREATE TABLE b(j int);
> CREATE TABLE c(k int NOT NULL);
> CREATE TABLE d(l int);
>
>   SELECT *
>     FROM
>         a
>         JOIN b ON b.j NOT IN
>                 ( SELECT
>                         c.k
>                     FROM
>                         c)
>         JOIN d ON b.j = d.l;
>
> Here the b.j=d.l condition makes the planner think that the "b.j NOT IN
> (SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not
> true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is
> joined to the other tables, so the NULL values of b.j are not filtered out
> early enough.
>
> I thought it would help if find_innerjoined_rels(), when called from
> expressions_are_not_nullable(), only collected rels (and quals) from the
> subtree below the sublink, but that does not seem to help:
>
> CREATE TABLE e(m int);
>
>   SELECT *
>     FROM
>         a
>         JOIN e ON a.i = e.m
>         JOIN b ON a.i NOT IN
>                 ( SELECT
>                         c.k
>                     FROM
>                         c)
>         JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0);
>
> Here it might seem that the a.i=e.m condition eliminates NULL values from the
> ANTI JOIN input, but it's probably hard to prove at query preparation time
> that
>
>      (((a JOIN e) JOIN b) ANTI JOIN c) JOIN d
>
> won't eventually be optimized to
>
>      (((a JOIN d) JOIN b) ANTI JOIN c) JOIN e
>
> Since the join condition between "a" and "d" is not strict in this case, the
> ANTI JOIN will receive the NULL values of a.i.
>
> It seems tricky, I've got no idea of an alternative approach right now.

Just one idea: perhaps we could use something like PlaceHolderVar to enforce
evaluation of the inner join expression ("a.i=e.m" in the example above) at
certain level of the join tree (in particular, below the ANTI JOIN) -
something like make_outerjoininfo() does here:

    /* Else, prevent join from being formed before we eval the PHV */
    min_righthand = bms_add_members(min_righthand, phinfo->ph_eval_at);

Unlike the typical use of PHV, we would not have to check whether the
expression is not evaluated too low in the tree because the quals collected by
find_innerjoined_rels() should not reference nullable side of any outer join.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning
Next
From: Fabien COELHO
Date:
Subject: Re: Why does pg_checksums -r not have a long option?