Re: Converting NOT IN to anti-joins during planning - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Converting NOT IN to anti-joins during planning
Date
Msg-id 3793.1565689764@linux-edt6
Whole thread Raw
In response to Re: Converting NOT IN to anti-joins during planning  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Converting NOT IN to anti-joins during planning
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> wrote:

> On Mon, 27 May 2019 at 20:43, Antonin Houska <ah@cybertec.at> wrote:
> > I've spent some time looking into this.
>
> Thank you for having a look at this.
>
> > One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
> > not necessarily at the top of the join tree. Consider this example:
> >
> > CREATE TABLE a(i int);
> > CREATE TABLE b(j int);
> > CREATE TABLE c(k int NOT NULL);
> > CREATE TABLE d(l int);
> >
> >   SELECT *
> >     FROM
> >         a
> >         JOIN b ON b.j NOT IN
> >                 ( SELECT
> >                         c.k
> >                     FROM
> >                         c)
> >         JOIN d ON b.j = d.l;
>
> hmm yeah. Since the proofs that are being used in
> expressions_are_not_nullable assume the join has already taken place,
> then we'll either need to not use the join conditions are proofs in
> that case, or just disable the optimisation instead.   I think it's
> fine to just disable the optimisation since it seem rather unlikely
> that someone would write a join condition like that.  Plus it seems
> quite a bit more complex to validate that the optimisation would even
> be correct if NULLs were not possible.
>
> I've attached a patch which restricts the pullups to FromExpr quals.
> Anything below a JoinExpr disables the optimisation now.

ok. The planner pulls-up other sublinks located in the ON clause, but it'd be
quite tricky to do the same for the NOT IN case.

Now that we only consider the WHERE clause, I wonder if the code can be
simplified a bit more. In particular, pull_up_sublinks_jointree_recurse()
passes valid pointer for notnull_proofs to pull_up_sublinks_qual_recurse(),
while it also passes under_joinexpr=true. The latter should imply that NOT IN
won't be converted to ANTI JOIN anyway, so no notnull_proofs should be needed.

BTW, I'm not sure if notnull_proofs=j->quals is correct in cases like this:

    case JOIN_LEFT:
        j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
             &j->rarg,
             rightrelids,
             NULL, NULL, j->quals,
             true);

Even if j->quals evaluates to NULL or FALSE (due to NULL value on its input),
it does not remove any rows (possibly containing NULL values) from the input
of the SubLink's expression.

I'm not even sure that expressions_are_not_nullable() needs the notnull_proofs
argument. Now that we only consider SubLinks in the WHERE clause, it seems to
me that nonnullable_vars is always a subset of nonnullable_inner_vars, isn't
it?

A few more minor findings:

@@ -225,10 +227,13 @@ pull_up_sublinks(PlannerInfo *root)
  *
  * In addition to returning the possibly-modified jointree node, we return
  * a relids set of the contained rels into *relids.
+ *
+ * under_joinexpr must be passed as true if 'jtnode' is or is under a
+ * JoinExpr.
  */
 static Node *
 pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
-                                                                 Relids *relids)
+                                                                 Relids *relids, bool under_joinexpr)
 {
        if (jtnode == NULL)
        {


The comment "if 'jtnode' is or is under ..." is unclear.


* is_NOTANY_compatible_with_antijoin()

  **  "'outerquery' is the parse of the query" -> "'outerquery' is the parse tree of the query"

  ** "2.  We assume that each join qual is an OpExpr" -> "2.  We assume that
     each sublink expression is an OpExpr" ?

  ** (OpExpr *) lfirst(lc) -> lfirst_node(OpExpr, lc)

  ** The kind of bool expression (AND_EXPR) might need to be tested here too:

+       /* Extract exprs from multiple expressions ANDed together */
+       else if (IsA(testexpr, BoolExpr))


* find_innerjoined_rels()

 "we locate all WHERE and JOIN/ON quals that constrain these rels add them to"
 ->
 " ... and add them ..."


* get_attnotnull()

  The comment says that FALSE is returned if the attribute is dropped, however
  the function it does not test att_tup->attisdropped. (This patch should not
  call the function for a dropped attribute, so I'm only saying that the
  function code is not consistent with the comment.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: amul sul
Date:
Subject: Re: Some memory not freed at the exit of RelationBuildPartitionDesc()
Next
From: Feike Steenbergen
Date:
Subject: Feature: Use DNS SRV records for connecting