hash semi join caused by "IN (select ...)" - Mailing list pgsql-performance

From Clemens Eisserer
Subject hash semi join caused by "IN (select ...)"
Date
Msg-id BANLkTikR6ngJS9=e1e1yA+E0FyxuJY_i3A@mail.gmail.com
Whole thread Raw
Responses Re: hash semi join caused by "IN (select ...)"
Re: hash semi join caused by "IN (select ...)"
List pgsql-performance
Hi,

I have a quite complex, performance sensitive query in a system with a
few (7) joins:
select .... from t1 left join t2 .... WHERE id IN (select ....)

For this query the planner evaluates the IN with a hash semi join last,
and all the joining is done by hash joins for all rows contained in t1.

However when I specify the ids manually (IN (1, 2, 3, 4, 5) the
planner first does an index lookup on the primary key column id,
and subsequently does nested loop joins using an index on t2 - which
gives way better results.

Is there any way to guide the planner to evaluate the IN condition
first, instead of last?
Why is the planner behaving this way? (postgresql 8.4.??)

Thank you in advance, Clemens


Query plan with IN(select):

Sort  (cost=165.77..165.77 rows=2 width=16974) (actual
time=13.459..13.460 rows=2 loops=1)
   Sort Key: this_.id
   Sort Method:  quicksort  Memory: 26kB
   ->  Hash Semi Join  (cost=123.09..165.76 rows=2 width=16974)
(actual time=12.741..13.432 rows=2 loops=1)
         Hash Cond: (this_.id = kladdenent0_.id)
         ->  Hash Left Join  (cost=119.17..160.90 rows=348
width=16974) (actual time=8.765..13.104 rows=342 loops=1)
               Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id)
               ->  Hash Left Join  (cost=118.10..155.08 rows=348
width=16454) (actual time=8.724..12.412 rows=342 loops=1)
                     Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id)
                     ->  Hash Left Join  (cost=117.06..152.71 rows=348
width=15934) (actual time=8.660..11.786 rows=342 loops=1)
                           Hash Cond: (this_.lehrerid = pilot5_.id)
                           ->  Hash Left Join  (cost=96.66..130.46
rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1)
                                 Hash Cond: (this_.nachid = flugplatz6_.id)
                                 ->  Hash Left Join
(cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429
rows=342 loops=1)
                                       Hash Cond: (this_.flugzeugid =
flugzeug2_.id)
                                       ->  Hash Left Join
(cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374
rows=342 loops=1)
                                             Hash Cond: (this_.pilotid
= pilot7_.id)
                                             ->  Hash Left Join
(cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548
rows=342 loops=1)
                                                   Hash Cond:
(this_.vonid = flugplatz8_.id)
                                                   ->  Seq Scan on
startkladde this_  (cost=0.00..14.48 rows=348 width=117) (actual
time=0.004..0.074 rows=342 loops=1)
                                                   ->  Hash
(cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79
loops=1)
                                                         ->  Seq Scan
on flugplatz flugplatz8_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.010 rows=79 loops=1)
                                             ->  Hash
(cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938
rows=375 loops=1)
                                                   ->  Seq Scan on
pilot pilot7_  (cost=0.00..15.73 rows=373 width=7022) (actual
time=0.006..0.769 rows=375 loops=1)
                                       ->  Hash  (cost=51.43..51.43
rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1)
                                             ->  Seq Scan on flugzeug
flugzeug2_  (cost=0.00..51.43 rows=1543 width=689) (actual
time=0.006..1.615 rows=1543 loops=1)
                                 ->  Hash  (cost=1.79..1.79 rows=79
width=542) (actual time=0.031..0.031 rows=79 loops=1)
                                       ->  Seq Scan on flugplatz
flugplatz6_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.011 rows=79 loops=1)
                           ->  Hash  (cost=15.73..15.73 rows=373
width=7022) (actual time=2.236..2.236 rows=375 loops=1)
                                 ->  Seq Scan on pilot pilot5_
(cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781
rows=375 loops=1)
                     ->  Hash  (cost=1.02..1.02 rows=2 width=520)
(actual time=0.005..0.005 rows=2 loops=1)
                           ->  Seq Scan on bmintype bmintype4_
(cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2
loops=1)
               ->  Hash  (cost=1.03..1.03 rows=3 width=520) (actual
time=0.004..0.004 rows=3 loops=1)
                     ->  Seq Scan on flugzeugtype flugzeugty3_
(cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3
loops=1)
         ->  Hash  (cost=3.90..3.90 rows=2 width=4) (actual
time=0.239..0.239 rows=2 loops=1)
               ->  Limit  (cost=0.00..3.88 rows=2 width=4) (actual
time=0.202..0.236 rows=2 loops=1)
                     ->  Index Scan using startkladde_pkey on
startkladde kladdenent0_  (cost=0.00..56.24 rows=29 width=4) (actual
time=0.200..0.233 rows=2 loops=1)
                           Filter: ((status > 0) OR (id = (-1)))

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Next
From: Tom Lane
Date:
Subject: Re: hash semi join caused by "IN (select ...)"