Left Outer Join much faster than non-outer Join? - Mailing list pgsql-performance

From rm_pg@cheapcomplexdevices.com
Subject Left Outer Join much faster than non-outer Join?
Date
Msg-id Pine.LNX.4.58.0503301219540.9713@greenie.cheapcomplexdevices.com
Whole thread Raw
Responses Re: Left Outer Join much faster than non-outer Join?
Re: Left Outer Join much faster than non-outer Join?
List pgsql-performance
Can anyone please help me make my JOIN find the right index to use?

It seems strange to me that in the two queries listed below, the
LEFT OUTER JOIN can find the most efficient index to use, while
the unadorned JOIN can not.   The result is that my query is
orders of magnitude slower than it seems it should be.



The table "tlid_smaller" (\d and explain analyze shown below) is a
large table contining integer IDs just like the fact table of any
traditional star-schema warehouse.

The tables *_lookup are simply tables that map strings to IDs, with
unique IDs associating strings to the IDs.

The table "tlid_smaller" has an index on (streetname_id, city_id) that
is extremely efficient at finding the desired row.  When I use a "LEFT
OUTER JOIN", the optimizer happily sees that it can use this index.
This is shown in the first explain analyze below.  However when I
simply do a "JOIN" the optimizer does not use this index and rather
does a hash join comparing thousands of rows.

Note that the cost estimate using the good index is much better
(16.94 vs 29209.16 thousands of times better).  Any ideas why
the non-outer join didn't use it?






fli=# explain analyze
 select *
     from streetname_lookup as sl
     join city_lookup as cl on (true)
     left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and
cl.geo_city_id=ts.geo_city_id)
     where  str_name='alamo' and  city='san antonio' and state='TX'
;
fli-# fli-# fli-# fli-# fli-# fli-#                                                                           QUERY
PLAN                                                                     \ 


---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..16.94 rows=1 width=74) (actual time=0.115..0.539 rows=78 loops=1)
   ->  Nested Loop  (cost=0.00..9.03 rows=1 width=42) (actual time=0.077..0.084 rows=1 loops=1)
         ->  Index Scan using streetname_lookup__str_name on streetname_lookup sl  (cost=0.00..3.01 rows=1 width=19)
(actualtime=0.042..0.044 rows=1 loops=1) 
               Index Cond: (str_name = 'alamo'::text)
         ->  Index Scan using city_lookup__name on city_lookup cl  (cost=0.00..6.01 rows=1 width=23) (actual
time=0.026..0.028rows=1 loops=1) 
               Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text))
   ->  Index Scan using tlid_smaller__street_city on tlid_smaller ts  (cost=0.00..7.86 rows=3 width=32) (actual
time=0.029..0.176rows=78 loops=1) 
         Index Cond: (("outer".geo_streetname_id = ts.geo_streetname_id) AND ("outer".geo_city_id = ts.geo_city_id))
 Total runtime: 0.788 ms
(9 rows)


fli=#
fli=# explain analyze
 select *
     from streetname_lookup as sl
     join city_lookup as cl on (true)
     join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
     where  str_name='alamo' and  city='san antonio' and state='TX'
;
fli-# fli-# fli-# fli-# fli-# fli-#                                                                              QUERY
PLAN                                                                  \ 


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.01..29209.16 rows=1 width=74) (actual time=9.421..28.154 rows=78 loops=1)
   Hash Cond: ("outer".geo_city_id = "inner".geo_city_id)
   ->  Nested Loop  (cost=0.00..29202.88 rows=52 width=51) (actual time=0.064..23.296 rows=4151 loops=1)
         ->  Index Scan using streetname_lookup__str_name on streetname_lookup sl  (cost=0.00..3.01 rows=1 width=19)
(actualtime=0.025..0.032 rows=1 loops=1) 
               Index Cond: (str_name = 'alamo'::text)
         ->  Index Scan using tlid_smaller__street_zipint on tlid_smaller ts  (cost=0.00..28994.70 rows=16413 width=32)
(actualtime=0.028..8.153 rows=4151 loops=1) 
               Index Cond: ("outer".geo_streetname_id = ts.geo_streetname_id)
   ->  Hash  (cost=6.01..6.01 rows=1 width=23) (actual time=0.073..0.073 rows=0 loops=1)
         ->  Index Scan using city_lookup__name on city_lookup cl  (cost=0.00..6.01 rows=1 width=23) (actual
time=0.065..0.067rows=1 loops=1) 
               Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text))
 Total runtime: 28.367 ms
(11 rows)

fli=#

fli=#



fli=# \d tlid_smaller
        Table "geo.tlid_smaller"
      Column       |  Type   | Modifiers
-------------------+---------+-----------
 tlid              | integer |
 geo_streetname_id | integer |
 geo_streettype_id | integer |
 geo_city_id       | integer |
 zipint            | integer |
 tigerfile         | integer |
 low               | integer |
 high              | integer |
Indexes:
    "tlid_smaller__city" btree (geo_city_id)
    "tlid_smaller__street_city" btree (geo_streetname_id, geo_city_id)
    "tlid_smaller__street_zipint" btree (geo_streetname_id, zipint)
    "tlid_smaller__tlid" btree (tlid)


pgsql-performance by date:

Previous
From: Richard_D_Levine@raytheon.com
Date:
Subject: Re: Reading recommendations
Next
From: Ron Mayer
Date:
Subject: Re: Left Outer Join much faster than non-outer Join?