Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs.

A simple example of this (without a subquery) is:

select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is unknown.

The attached patch allows an ANTI-join plan to be generated in cases like:

CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL);
CREATE TABLE b (id INT NOT NULL);

SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b);

To generate a plan like:
                           QUERY PLAN
-----------------------------------------------------------------
 Hash Anti Join  (cost=64.00..137.13 rows=1070 width=8)
   Hash Cond: (a.b_id = b.id)
   ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=8)
   ->  Hash  (cost=34.00..34.00 rows=2400 width=4)
         ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)

But if we then do:
ALTER TABLE b ALTER COLUMN id DROP NOT NULL;

The plan will go back to the current behaviour of:

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on a  (cost=40.00..76.75 rows=1070 width=8)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)

Comments are welcome

Regards

David Rowley

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Proposing pg_hibernate
Next
From: Kevin Grittner
Date:
Subject: Re: Scaling shared buffer eviction