Re: SQL 'in' vs join. - Mailing list pgsql-hackers

From mlw
Subject Re: SQL 'in' vs join.
Date
Msg-id 3A265826.B7AF2DBE@mohawksoft.com
Whole thread Raw
In response to SQL 'in' vs join.  (mlw <markw@mohawksoft.com>)
Responses Re: SQL 'in' vs join.
RE: SQL 'in' vs join.
List pgsql-hackers
Hannu Krosing wrote:
> 
> mlw wrote:
> >
> > Why is a "select * from table1 where field in (select field from table2
> > where condition )"
> >
> > is so dramatically bad compared to:
> >
> > "select * from table1, table2 where table1.field = table2.field and
> > condition"
> >
> > I can't understand why the first query isn't optimized better than the
> > second one. The 'in' query forces a full table scan (it shouldn't) and
> > the second one uses the indexes. Does anyone know why?
> 
> Its not done yet, and probably hsomewhat difficult to do in a general
> fashion
> 
> > I know I am no SQL guru, but my gut tells me that the 'in' operator
> > should be far more efficient than a join.
> >
> > Here are the actual queries:
> >
> > cdinfo=# explain select trackid from zsong where muzenbr in (select
> > muzenbr from ztitles where title = 'Mulan') ;
> 
> try
> 
> explain
>  select trackid
>    from zsong
>   where muzenbr in (
>          select muzenbr
>            from ztitles
>           where title = 'Mulan'
>             and ztitles.muzenbr=zsong.muzenbr
>   );
> 
> this should hint the current optimizer to do the right thing;
> 
> -----------------
> Hannu

Nope:

cdinfo=# explain
cdinfo-#  select trackid
cdinfo-#    from zsong
cdinfo-#   where muzenbr in (
cdinfo(#          select muzenbr
cdinfo(#            from ztitles
cdinfo(#           where title = 'Mulan'
cdinfo(#             and ztitles.muzenbr=zsong.muzenbr
cdinfo(#   );
NOTICE:  QUERY PLAN:
Seq Scan on zsong  (cost=100000000.00..104474515.18 rows=2193213
width=4) SubPlan   ->  Index Scan using ztitles_pkey on ztitles  (cost=0.00..4.05
rows=1 width=4)  


But what I also find odd is, look at the components:

cdinfo=# explain select muzenbr from ztitles where title = 'Mulan' ;
NOTICE:  QUERY PLAN:
Index Scan using ztitles_title_ndx on ztitles  (cost=0.00..7.08 rows=1
width=4)  

cdinfo=# explain select trackid from zsong where muzenbr in ( 1,2,3,4,5)
;
NOTICE:  QUERY PLAN:
Index Scan using zsong_muzenbr_ndx, zsong_muzenbr_ndx,
zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx on zsong 
(cost=0.00..392.66 rows=102 width=4) 


Now, given the two components, each with very low costs, it chooses to
do a sequential scan on the table. I don't get it. I have have been
having no end of problems with Postgres' optimizer. It just seems to be
brain dead at times. It is a huge point of frustration to me. I am tied
to postgres in my current project, and I fear that I will not be able to
implement certain features because of this sort of behavior.


-- 
http://www.mohawksoft.com


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: SQL 'in' vs join.
Next
From: Michael Fork
Date:
Subject: Re: