Home > mailing lists

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets - Mailing list pgsql-bugs

From	Tom Molesworth
Subject	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
Date	July 28, 2009 20:00:23
Msg-id	4A6F78C5.9070607@audioboundary.com Whole thread Raw
In response to	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets (Ole Tange <ole@tange.dk>)
Responses	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
List	pgsql-bugs

Tree view

Ole Tange wrote:
> On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
>> "Ole Tange" <postgresql.org@tange.dk> writes:
>>
>>> (modulo NULLs which seem to always cause problems in NOT INs).
>>>
>>> Because it can be rewritten, NOT IN should never be much slower than the
>>> rewritten solution, as PostgreSQL should simply rewrite NOT IN to the
>>> above.
>>>
>> Let's see, you understand that the rewrite violates the SQL standard
>> semantics of NOT IN, but you think we should do it anyway?
>>
>
> Thanks for your kind reply.
>
> Apparently my bug report was not quite clear. My rewrite example was
> simply to show a way that could do a marvelous speedup on medium to
> large sets. The correct dealing with NULL I am sure can be handled
> just as efficiently.
>

Just an observation - it seems that you're using NOT IN() and expecting
it to do the same as this:

SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL

I find it's comparatively rare to actually want the NOT IN()
null-handling semantics, especially given your comment about nulls
'causing problems' (although I guess you could get the same behaviour in
that query as NOT IN, by adding 'AND a.key IS NOT NULL' to the where
clause - I have no idea whether this is something the optimiser could or
should be able to do).

Although it refers to TSQL, this page might help explain more about NOT IN:

http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values

Not sure if constructions such as 'not exists (select 1 from b where
b.key = a.key)' also have the same performance issues you describe -
should be faster due to the key? - but I find the left join approach
works well enough up to several million rows in the two tables. An
unindexed subquery like 'select key from a' just seems like a guaranteed
way to invoking a full (slow) table scan.

Tom

pgsql-bugs by date:

From: "Jim Michaels"
Date: 28 July 2009, 19:57:53
Subject: BUG #4951: installation dir wrong for libpq compilation

From: Robert Haas
Date: 28 July 2009, 22:20:32
Subject: Re: BUG #4945: Parallel update(s) gone wild

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets - Mailing list pgsql-bugs

Previous

Next