Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Allowing NOT IN to use ANTI joins
Date
Msg-id CAMkU=1zPVbez_HWao781L8PzFk+d1J8VaJuhyjUHaRifk6OcUA@mail.gmail.com
Whole thread Raw
In response to Allowing NOT IN to use ANTI joins  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Jun 8, 2014 at 5:36 AM, David Rowley <dgrowleyml@gmail.com> wrote:
Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs.

A simple example of this (without a subquery) is:

select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is unknown.

The attached patch allows an ANTI-join plan to be generated in cases like:

CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL);
CREATE TABLE b (id INT NOT NULL);

SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b);

To generate a plan like:
                           QUERY PLAN
-----------------------------------------------------------------
 Hash Anti Join  (cost=64.00..137.13 rows=1070 width=8)
   Hash Cond: (a.b_id = b.id)
   ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=8)
   ->  Hash  (cost=34.00..34.00 rows=2400 width=4)
         ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)


I think this will be great, I've run into this problem often from applications I have no control over.  I thought a more complete, but probably much harder, solution would be to add some metadata to the hash anti-join infrastructure that tells it "If you find any nulls in the outer scan, stop running without returning any rows".  I think that should work because the outer rel already has to run completely before any rows can be returned.

But what I can't figure out is, would that change obviate the need for your change?  Once we can correctly deal with nulls in a NOT IN list through a hash anti join, is there a cost estimation advantage to being able to prove that the that null can't occur?  (And of course if you have code that works, while I have vague notions of what might be, then my notion probably does not block your code.)
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: "RETURNING PRIMARY KEY" syntax extension
Next
From: Tom Lane
Date:
Subject: Re: Allowing NOT IN to use ANTI joins