Re: NOT IN vs. NOT EXISTS performance - Mailing list pgsql-performance

From David Rowley
Subject Re: NOT IN vs. NOT EXISTS performance
Date
Msg-id CAKJS1f9GW13y9Kkdu85Ax5gNpvmzO3xNO5ks5SWL8+TcKc=Qiw@mail.gmail.com
Whole thread Raw
In response to NOT IN vs. NOT EXISTS performance  (Lincoln Swaine-Moore <lswainemoore@gmail.com>)
Responses Re: NOT IN vs. NOT EXISTS performance
List pgsql-performance
On 9 November 2018 at 08:35, Lincoln Swaine-Moore
<lswainemoore@gmail.com> wrote:
> My primary question is: why is this approach only possible (for data too
> large for memory) when using NOT EXISTS, and not when using NOT IN?
>
> I understand that there is a slight difference in the meaning of the two
> expressions, in that NOT IN will produce NULL if there are any NULL values
> in the right hand side (in this case there are none, and the queries should
> return the same COUNT). But if anything, I would expect that to improve
> performance of the NOT IN operation, since a single pass through that data
> should reveal if there are any NULL values, at which point that information
> could be used to short-circuit. So I am a bit baffled.

The problem is that the planner makes the plan and would have to know
beforehand that no NULLs could exist on either side of the join. For
more simple cases it could make use of NOT NULL constaints, but more
complex cases exist, such as:

SELECT * FROM t1 LEFT JOIN t2 ON t1.x = t2.y WHERE t2.y NOT IN(SELECT
z FROM t3);

There's a bit more reading about the complexity of this in [1]

[1]
https://www.postgresql.org/message-id/flat/CAMkU%3D1zPVbez_HWao781L8PzFk%2Bd1J8VaJuhyjUHaRifk6OcUA%40mail.gmail.com#7c6d3178c18103d8508f3ec5982b1b8e

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-performance by date:

Previous
From: Lincoln Swaine-Moore
Date:
Subject: NOT IN vs. NOT EXISTS performance
Next
From: Merlin Moncure
Date:
Subject: Re: NOT IN vs. NOT EXISTS performance