IN() Optimization issue in 8.0rc5 - Mailing list pgsql-performance

From Josh Berkus
Subject IN() Optimization issue in 8.0rc5
Date
Msg-id 200501151223.10372.josh@agliodbs.com
Whole thread Raw
Responses Re: IN() Optimization issue in 8.0rc5
List pgsql-performance
Tom,

Hmmm ... I'm seeing an issue with IN() optimization -- or rather the lack of
it -- in 8.0rc5.    It seems to me that this worked better in 7.4, although
I've not been able to load this particular database and test

dm=# explain
dm-# SELECT personid FROM mr.person_attributes_old
dm-#                                        WHERE personid NOT IN (SELECT
personid FROM mr.person_attributes);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Seq Scan on person_attributes_old  (cost=0.00..3226144059.85 rows=235732
width=4)
   Filter: (NOT (subplan))
   SubPlan
     ->  Seq Scan on person_attributes  (cost=0.00..12671.07 rows=405807
width=4)
(4 rows)

dm=# explain select pao.personid from mr.person_attributes_old pao
dm-# left outer join mr.person_attributes p on pao.personid = p.personid
dm-# where p.personid is null;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..34281.83 rows=471464 width=4)
   Merge Cond: ("outer".personid = "inner".personid)
   Filter: ("inner".personid IS NULL)
   ->  Index Scan using idx_opa_person on person_attributes_old pao
(cost=0.00..13789.29 rows=471464 width=4)
   ->  Index Scan using idx_pa_person on person_attributes p
(cost=0.00..14968.25 rows=405807 width=4)
(5 rows)

It seems like the planner ought to recognize that the first form of the query
is optimizable into the 2nd form, and that I've seen it do so in 7.4.
However, *no* amount of manipulation of query parameters I did on the 1st
form of the query were successful in getting the planner to recognize that it
could use indexes for the IN() form of the query.

Thoughts?

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Jan Dittmer
Date:
Subject: Re: which dual-CPU hardware/OS is fastest for PostgreSQL?
Next
From: Tom Lane
Date:
Subject: Re: IN() Optimization issue in 8.0rc5