bad planner pick... but why? - Mailing list pgsql-novice

From
Subject bad planner pick... but why?
Date
Msg-id 03b301c6f13b$caaf7a20$6501a8c0@iwing
Whole thread Raw
Responses Re: bad planner pick... but why?
List pgsql-novice
hi list

please have a look at this two queries:

----------------------------------------

SELECT mov_id FROM oldtables.movies LEFT JOIN oldtables.content ON
movies.mov_id = content.c_m_id
WHERE mov_id IN (SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE
rel_personid = 40544)

plan:

Hash IN Join  (cost=205.04..14030.40 rows=1 width=4) (actual
time=105.568..277.782 rows=1 loops=1)
  Hash Cond: ("outer".mov_id = "inner".rel_movieid)
  ->  Hash Left Join  (cost=184.00..13824.49 rows=36973 width=4) (actual
time=7.563..260.627 rows=36997 loops=1)
        Hash Cond: ("outer".mov_id = "inner".c_m_id)
        ->  Seq Scan on movies  (cost=0.00..13140.73 rows=36973 width=4)
(actual time=0.070..206.254 rows=36997 loops=1)
        ->  Hash  (cost=174.60..174.60 rows=3760 width=4) (actual
time=7.467..7.467 rows=3760 loops=1)
              ->  Seq Scan on content  (cost=0.00..174.60 rows=3760 width=4)
(actual time=0.015..4.729 rows=3760 loops=1)
  ->  Hash  (cost=21.03..21.03 rows=1 width=4) (actual time=0.113..0.113
rows=1 loops=1)
        ->  Unique  (cost=20.99..21.02 rows=1 width=4) (actual
time=0.108..0.109 rows=1 loops=1)
              ->  Sort  (cost=20.99..21.01 rows=6 width=4) (actual
time=0.106..0.107 rows=1 loops=1)
                    Sort Key: rel_persons.rel_movieid
                    ->  Index Scan using rel_persons_personid_idx on
rel_persons  (cost=0.00..20.92 rows=6 width=4) (actual time=0.094..0.098
rows=1 loops=1)
                          Index Cond: (rel_personid = 40544)
Total runtime: 277.901 ms

----------------------------------------


SELECT mov_id FROM oldtables.movies LEFT JOIN oldtables.content ON
movies.mov_id = content.c_m_id
WHERE c_m_id IN (SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE
rel_personid = 40544)

plan:

Nested Loop  (cost=20.99..32.69 rows=1 width=4) (actual time=0.169..0.175
rows=1 loops=1)
  ->  Nested Loop  (cost=20.99..27.05 rows=1 width=8) (actual
time=0.158..0.162 rows=1 loops=1)
        ->  Unique  (cost=20.99..21.02 rows=1 width=4) (actual
time=0.114..0.116 rows=1 loops=1)
              ->  Sort  (cost=20.99..21.01 rows=6 width=4) (actual
time=0.113..0.113 rows=1 loops=1)
                    Sort Key: rel_persons.rel_movieid
                    ->  Index Scan using rel_persons_personid_idx on
rel_persons  (cost=0.00..20.92 rows=6 width=4) (actual time=0.098..0.102
rows=1 loops=1)
                          Index Cond: (rel_personid = 40544)
        ->  Index Scan using movies_mov_id_idx on movies  (cost=0.00..6.00
rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
              Index Cond: ("outer".rel_movieid = movies.mov_id)
  ->  Index Scan using content_pkey on content  (cost=0.00..5.63 rows=1
width=4) (actual time=0.008..0.009 rows=1 loops=1)
        Index Cond: ("outer".mov_id = content.c_m_id)
Total runtime: 0.295 ms

----------------------------------------

query #1 is factor 1000 slower, because the two tables "movies" (~40k
entries) and "content" (~30k entries) seem to be joined prior to filtering
by the IN (....). any ideas why the planer decides not to first evaluate the
IN (...) statement in the first case?

here's the plan for the IN (...) subselect:

----------------------------------------

SELECT DISTINCT rel_movieid FROM infos.rel_persons WHERE rel_personid =
40544

plan:
Unique  (cost=20.99..21.02 rows=1 width=4) (actual time=0.112..0.113 rows=1
loops=1)
  ->  Sort  (cost=20.99..21.01 rows=6 width=4) (actual time=0.110..0.110
rows=1 loops=1)
        Sort Key: rel_movieid
        ->  Index Scan using rel_persons_personid_idx on rel_persons
(cost=0.00..20.92 rows=6 width=4) (actual time=0.095..0.098 rows=1 loops=1)
              Index Cond: (rel_personid = 40544)
Total runtime: 0.155 ms

----------------------------------------

its pgsql 8.1

thanks,
thomas




pgsql-novice by date:

Previous
From: Yadnyesh Joshi
Date:
Subject: Re: Using host variables -- segmentation fault
Next
From: Tom Lane
Date:
Subject: Re: bad planner pick... but why?