Re: IN() Optimization issue in 8.0rc5 - Mailing list pgsql-performance

From Tom Lane
Subject Re: IN() Optimization issue in 8.0rc5
Date
Msg-id 11485.1105822417@sss.pgh.pa.us
Whole thread Raw
In response to IN() Optimization issue in 8.0rc5  (Josh Berkus <josh@agliodbs.com>)
Responses Re: IN() Optimization issue in 8.0rc5
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> dm=# explain
> dm-# SELECT personid FROM mr.person_attributes_old
> dm-#                                        WHERE personid NOT IN (SELECT
> personid FROM mr.person_attributes);
>                                     QUERY PLAN
> -----------------------------------------------------------------------------------
>  Seq Scan on person_attributes_old  (cost=0.00..3226144059.85 rows=235732
> width=4)
>    Filter: (NOT (subplan))
>    SubPlan
>      ->  Seq Scan on person_attributes  (cost=0.00..12671.07 rows=405807
> width=4)
> (4 rows)

Hmm.  What you want for a NOT IN is for it to say
   Filter: (NOT (hashed subplan))
which you are not getting.  What's the datatypes of the two personid
columns?  Is the 400k-row estimate for person_attributes reasonable?
Maybe you need to increase work_mem (nee sort_mem) to allow a
400k-row hash table?

            regards, tom lane

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: IN() Optimization issue in 8.0rc5
Next
From: Josh Berkus
Date:
Subject: Re: IN() Optimization issue in 8.0rc5