Thread: Why do my hash joins turn to nested loops?

Why do my hash joins turn to nested loops?

From
pgsql-performance@ian.org
Date:
I only have a few days of experience with postgres and it is working
great, but when I start building up test queries I ran into a problem I
don't understand.

One query works fast, returning results in under a second.  If I insert
one more join into the table however, it switches to nested-loops and
takes minutes.  It does this randomly.. using integer compares in the
query lines is fast for instance, but using LIKE operators causes it to
use loops again.

I have about 200,000 items per table, so nested loops cause lots of pain.
The database has objects with lots (100 to 1000) 'specs' for each object
in another table, so I have to pull them out to do queries and sorting on
them.

Here are the two queries.  They are the same except the first has
two 'find' joins and the other has three.

I assume that I am hitting some limit somewhere that is causing postgres
to change it's idea on how to run the query.  Can I force it to use hash
joins?

Thanks!

So this is fast...

EXPLAIN ANALYZE SELECT * FROM logical
LEFT JOIN model ON logical.uid = model.logical_uid
LEFT JOIN company ON model.company_uid = company.uid
LEFT JOIN type ON logical.type::INT = type.uid
JOIN specs spec_find1 ON spec_find1 .spec_uid='8' AND spec_find1 .text LIKE '%' AND spec_find1
.logical_uid=model.logical_uid 
JOIN specs spec_find2 ON spec_find2.spec_uid='8' AND spec_find2.text LIKE '%' AND
spec_find2.logical_uid=model.logical_uid 
LEFT JOIN specs specs_sort ON specs_sort.spec_uid='4' AND specs_sort.logical_uid=logical.uid
ORDER BY specs_sort.num;

                                                                               QUERY PLAN
                                                 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5398.43..5398.44 rows=6 width=247) (actual time=331.546..333.303 rows=3555 loops=1)
   Sort Key: specs_sort.num
   Sort Method:  quicksort  Memory: 981kB
   ->  Hash Left Join  (cost=1087.68..5398.35 rows=6 width=247) (actual time=37.309..315.451 rows=3555 loops=1)
         Hash Cond: (model.company_uid = company.uid)
         ->  Hash Left Join  (cost=1086.28..5396.86 rows=6 width=217) (actual time=37.235..308.787 rows=3555 loops=1)
               Hash Cond: (logical.uid = specs_sort.logical_uid)
               ->  Hash Left Join  (cost=694.84..5004.96 rows=6 width=180) (actual time=22.433..284.832 rows=3555
loops=1)
                     Hash Cond: ((logical.type)::integer = type.uid)
                     ->  Nested Loop  (cost=693.55..5003.62 rows=6 width=168) (actual time=22.361..273.502 rows=3555
loops=1)
                           ->  Hash Join  (cost=693.55..4953.84 rows=6 width=110) (actual time=22.330..237.717
rows=3555loops=1) 
                                 Hash Cond: (model.logical_uid = spec_find1.logical_uid)
                                 ->  Seq Scan on model  (cost=0.00..3337.82 rows=184182 width=36) (actual
time=0.017..99.289rows=184182 loops=1) 
                                 ->  Hash  (cost=691.60..691.60 rows=156 width=74) (actual time=21.795..21.795
rows=2196loops=1) 
                                       ->  Hash Join  (cost=339.84..691.60 rows=156 width=74) (actual
time=8.558..19.060rows=2196 loops=1) 
                                             Hash Cond: (spec_find1.logical_uid = spec_find2.logical_uid)
                                             ->  Seq Scan on specs spec_find1  (cost=0.00..326.89 rows=1036 width=37)
(actualtime=0.023..6.765 rows=2196 loops=1) 
                                                   Filter: (((text)::text ~~ '%'::text) AND (spec_uid = 8))
                                             ->  Hash  (cost=326.89..326.89 rows=1036 width=37) (actual
time=8.508..8.508rows=2196 loops=1) 
                                                   ->  Seq Scan on specs spec_find2  (cost=0.00..326.89 rows=1036
width=37)(actual time=0.010..6.667 rows=2196 loops=1) 
                                                         Filter: (((text)::text ~~ '%'::text) AND (spec_uid = 8))
                           ->  Index Scan using logical_pkey on logical  (cost=0.00..8.28 rows=1 width=58) (actual
time=0.006..0.007rows=1 loops=3555) 
                                 Index Cond: (logical.uid = model.logical_uid)
                     ->  Hash  (cost=1.13..1.13 rows=13 width=12) (actual time=0.024..0.024 rows=13 loops=1)
                           ->  Seq Scan on type  (cost=0.00..1.13 rows=13 width=12) (actual time=0.004..0.011 rows=13
loops=1)
               ->  Hash  (cost=287.57..287.57 rows=8309 width=37) (actual time=14.773..14.773 rows=8172 loops=1)
                     ->  Seq Scan on specs specs_sort  (cost=0.00..287.57 rows=8309 width=37) (actual time=0.012..8.206
rows=8172loops=1) 
                           Filter: (spec_uid = 4)
         ->  Hash  (cost=1.18..1.18 rows=18 width=30) (actual time=0.032..0.032 rows=18 loops=1)
               ->  Seq Scan on company  (cost=0.00..1.18 rows=18 width=30) (actual time=0.005..0.013 rows=18 loops=1)
 Total runtime: 338.389 ms
(31 rows)


Adding a third 'spec_find' join causes the outer joins to be looped.

EXPLAIN ANALYZE SELECT logical.uid FROM logical
LEFT JOIN model ON logical.uid = model.logical_uid
LEFT JOIN company ON model.company_uid = company.uid
LEFT JOIN type ON logical.type::INT = type.uid
JOIN specs spec_find1 ON spec_find1 .spec_uid='8' AND spec_find1 .text LIKE '%' AND spec_find1
.logical_uid=model.logical_uid 
JOIN specs spec_find2 ON spec_find2.spec_uid='8' AND spec_find2.text LIKE '%' AND
spec_find2.logical_uid=model.logical_uid 
JOIN specs spec_find3 ON spec_find3 .spec_uid='8' AND spec_find3 .text LIKE '%' AND spec_find3
.logical_uid=model.logical_uid 
LEFT JOIN specs specs_sort ON specs_sort.spec_uid='4' AND specs_sort.logical_uid=logical.uid
ORDER BY specs_sort.num;

                                                                                  QUERY PLAN
                                                      

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5464.51..5464.52 rows=1 width=8) (actual time=41770.603..41772.111 rows=3555 loops=1)
   Sort Key: specs_sort.num
   Sort Method:  quicksort  Memory: 203kB
   ->  Nested Loop Left Join  (cost=1035.47..5464.50 rows=1 width=8) (actual time=40.324..41756.946 rows=3555 loops=1)
         ->  Nested Loop Left Join  (cost=1035.47..5464.22 rows=1 width=12) (actual time=40.313..41716.150 rows=3555
loops=1)
               ->  Nested Loop Left Join  (cost=1035.47..5463.93 rows=1 width=21) (actual time=40.272..41639.961
rows=3555loops=1) 
                     Join Filter: (specs_sort.logical_uid = logical.uid)
                     ->  Nested Loop  (cost=1035.47..5072.50 rows=1 width=17) (actual time=32.400..340.250 rows=3555
loops=1)
                           ->  Hash Join  (cost=1035.47..5064.20 rows=1 width=20) (actual time=32.370..251.601
rows=3555loops=1) 
                                 Hash Cond: (model.logical_uid = spec_find1.logical_uid)
                                 ->  Seq Scan on model  (cost=0.00..3337.82 rows=184182 width=8) (actual
time=0.015..95.970rows=184182 loops=1) 
                                 ->  Hash  (cost=1035.18..1035.18 rows=23 width=12) (actual time=31.890..31.890
rows=2196loops=1) 
                                       ->  Hash Join  (cost=679.68..1035.18 rows=23 width=12) (actual
time=17.178..30.063rows=2196 loops=1) 
                                             Hash Cond: (spec_find1.logical_uid = spec_find2.logical_uid)
                                             ->  Hash Join  (cost=339.84..691.60 rows=156 width=8) (actual
time=8.568..18.308rows=2196 loops=1) 
                                                   Hash Cond: (spec_find1.logical_uid = spec_find3.logical_uid)
                                                   ->  Seq Scan on specs spec_find1  (cost=0.00..326.89 rows=1036
width=4)(actual time=0.015..6.603 rows=2196 loops=1) 
                                                         Filter: (((text)::text ~~ '%'::text) AND (spec_uid = 8))
                                                   ->  Hash  (cost=326.89..326.89 rows=1036 width=4) (actual
time=8.513..8.513rows=2196 loops=1) 
                                                         ->  Seq Scan on specs spec_find3  (cost=0.00..326.89 rows=1036
width=4)(actual time=0.011..6.851 rows=2196 loops=1) 
                                                               Filter: (((text)::text ~~ '%'::text) AND (spec_uid = 8))
                                             ->  Hash  (cost=326.89..326.89 rows=1036 width=4) (actual
time=8.588..8.588rows=2196 loops=1) 
                                                   ->  Seq Scan on specs spec_find2  (cost=0.00..326.89 rows=1036
width=4)(actual time=0.022..6.939 rows=2196 loops=1) 
                                                         Filter: (((text)::text ~~ '%'::text) AND (spec_uid = 8))
                           ->  Index Scan using logical_pkey on logical  (cost=0.00..8.28 rows=1 width=13) (actual
time=0.015..0.019rows=1 loops=3555) 
                                 Index Cond: (logical.uid = model.logical_uid)
                     ->  Seq Scan on specs specs_sort  (cost=0.00..287.57 rows=8309 width=8) (actual time=0.009..7.996
rows=8172loops=3555) 
                           Filter: (specs_sort.spec_uid = 4)
               ->  Index Scan using type_pkey on type  (cost=0.00..0.27 rows=1 width=4) (actual time=0.010..0.011
rows=1loops=3555) 
                     Index Cond: ((logical.type)::integer = type.uid)
         ->  Index Scan using company_pkey on company  (cost=0.00..0.27 rows=1 width=4) (actual time=0.006..0.007
rows=1loops=3555) 
               Index Cond: (model.company_uid = company.uid)
 Total runtime: 41773.972 ms
(33 rows)

--
Ian Smith
www.ian.org

Re: Why do my hash joins turn to nested loops?

From
Tom Lane
Date:
pgsql-performance@ian.org writes:
> One query works fast, returning results in under a second.  If I insert
> one more join into the table however, it switches to nested-loops and
> takes minutes.

I think you need to raise from_collapse_limit and/or
join_collapse_limit.

            regards, tom lane

Re: Why do my hash joins turn to nested loops?

From
pgsql-performance@ian.org
Date:
On Thu, 21 Aug 2008, Tom Lane wrote:
> I think you need to raise from_collapse_limit and/or
> join_collapse_limit.

Ahah, that was it.. a much simpler solution than I was fearing.

I had already re-written the queries to get around it, but ran into
another snag with that method, so this was good timing.

Thanks!

--
Ian Smith
www.ian.org