Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing NOT IN to use ANTI joins
Date
Msg-id CAApHDvqpvCg7BzR-b15bhA9JJDm5QOMjK3_Z-x=F21b1nOhG2g@mail.gmail.com
Whole thread Raw
In response to Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allowing NOT IN to use ANTI joins
Re: Allowing NOT IN to use ANTI joins
List pgsql-hackers
On Fri, Jul 11, 2014 at 1:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> We could no doubt fix this by also insisting that the left-side vars
> be provably not null, but that's going to make the patch even slower
> and even less often applicable.  I'm feeling discouraged about whether
> this is worth doing in this form.

Hm ... actually, there might be a better answer: what about transforming

   WHERE (x,y) NOT IN (SELECT provably-not-null-values FROM ...)

to

   WHERE <antijoin condition> AND x IS NOT NULL AND y IS NOT NULL

?


I had another look at this and it appears you were right the first time, we need to ensure there's no NULLs on both sides of the join condition.

The reason for this is that there's a special case with "WHERE col NOT IN(SELECT id from empty_relation)", this is effectively the same as "WHERE true", so we should see *all* rows, even ones where col is null. Adding a col IS NOT NULL cannot be done as it would filter out the NULLs in this special case.

The only other way I could imagine fixing this would be to have some other sort of join type that always met the join condition if the right side of the join had no tuples... Of course I'm not suggesting it gets implemented this way, I'm just otherwise out of ideas.

 Regards

David Rowley

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: tweaking NTUP_PER_BUCKET
Next
From: Andres Freund
Date:
Subject: Re: Allowing NOT IN to use ANTI joins