Re: SQL 'in' vs join. - Mailing list pgsql-hackers
From | mlw |
---|---|
Subject | Re: SQL 'in' vs join. |
Date | |
Msg-id | 3A265826.B7AF2DBE@mohawksoft.com Whole thread Raw |
In response to | SQL 'in' vs join. (mlw <markw@mohawksoft.com>) |
Responses |
Re: SQL 'in' vs join.
RE: SQL 'in' vs join. |
List | pgsql-hackers |
Hannu Krosing wrote: > > mlw wrote: > > > > Why is a "select * from table1 where field in (select field from table2 > > where condition )" > > > > is so dramatically bad compared to: > > > > "select * from table1, table2 where table1.field = table2.field and > > condition" > > > > I can't understand why the first query isn't optimized better than the > > second one. The 'in' query forces a full table scan (it shouldn't) and > > the second one uses the indexes. Does anyone know why? > > Its not done yet, and probably hsomewhat difficult to do in a general > fashion > > > I know I am no SQL guru, but my gut tells me that the 'in' operator > > should be far more efficient than a join. > > > > Here are the actual queries: > > > > cdinfo=# explain select trackid from zsong where muzenbr in (select > > muzenbr from ztitles where title = 'Mulan') ; > > try > > explain > select trackid > from zsong > where muzenbr in ( > select muzenbr > from ztitles > where title = 'Mulan' > and ztitles.muzenbr=zsong.muzenbr > ); > > this should hint the current optimizer to do the right thing; > > ----------------- > Hannu Nope: cdinfo=# explain cdinfo-# select trackid cdinfo-# from zsong cdinfo-# where muzenbr in ( cdinfo(# select muzenbr cdinfo(# from ztitles cdinfo(# where title = 'Mulan' cdinfo(# and ztitles.muzenbr=zsong.muzenbr cdinfo(# ); NOTICE: QUERY PLAN: Seq Scan on zsong (cost=100000000.00..104474515.18 rows=2193213 width=4) SubPlan -> Index Scan using ztitles_pkey on ztitles (cost=0.00..4.05 rows=1 width=4) But what I also find odd is, look at the components: cdinfo=# explain select muzenbr from ztitles where title = 'Mulan' ; NOTICE: QUERY PLAN: Index Scan using ztitles_title_ndx on ztitles (cost=0.00..7.08 rows=1 width=4) cdinfo=# explain select trackid from zsong where muzenbr in ( 1,2,3,4,5) ; NOTICE: QUERY PLAN: Index Scan using zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx, zsong_muzenbr_ndx on zsong (cost=0.00..392.66 rows=102 width=4) Now, given the two components, each with very low costs, it chooses to do a sequential scan on the table. I don't get it. I have have been having no end of problems with Postgres' optimizer. It just seems to be brain dead at times. It is a huge point of frustration to me. I am tied to postgres in my current project, and I fear that I will not be able to implement certain features because of this sort of behavior. -- http://www.mohawksoft.com
pgsql-hackers by date: