Re: NOT IN >2hrs vs EXCEPT < 2 sec. - Mailing list pgsql-performance

From Tom Lane
Subject Re: NOT IN >2hrs vs EXCEPT < 2 sec.
Date
Msg-id 28345.1233241001@sss.pgh.pa.us
Whole thread Raw
In response to NOT IN >2hrs vs EXCEPT < 2 sec.  (Kevin Traster <kevin@mffais.com>)
List pgsql-performance
Kevin Traster <kevin@mffais.com> writes:
>  Unique  (cost=3506.21..303375872.86 rows=71946 width=8)
>    ->  Index Scan using cik_ciknum_idx on cik  (cost=3506.21..303375616.75
> rows=102444 width=8)
>          Filter: (NOT (subplan))
>          SubPlan
>            ->  Materialize  (cost=3506.21..6002.40 rows=186019 width=4)
>                  ->  Seq Scan on owner_cik_master  (cost=0.00..2684.19
> rows=186019 width=4)

It will help some if you raise work_mem enough so you get a "hashed
subplan" there, assuming the NOT IN is on a hashable datatype.

But as was already noted, more work has been put into optimizing
EXCEPT and NOT EXISTS than NOT IN, because the latter is substantially
less useful due to its unintuitive but spec-mandated handling of NULLs.
(And this disparity will be even larger in 8.4.)  We're not going to
apologize for that, and we're not going to regard it as a bug.

            regards, tom lane

pgsql-performance by date:

Previous
From: Gregory Stark
Date:
Subject: Re: NOT IN >2hrs vs EXCEPT < 2 sec.
Next
From:
Date:
Subject: Max on union