Re: Left Outer Join much faster than non-outer Join? - Mailing list pgsql-performance

From Ron Mayer
Subject Re: Left Outer Join much faster than non-outer Join?
Date
Msg-id 424BB1BB.90304@cheapcomplexdevices.com
Whole thread Raw
In response to Re: Left Outer Join much faster than non-outer Join?  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Responses Re: Left Outer Join much faster than non-outer Join?
List pgsql-performance
Ron Mayer wrote:
> Tom Lane wrote:
>> rm_pg@cheapcomplexdevices.com writes:
>>> select *
>>>     from streetname_lookup as sl
>>>     join city_lookup as cl on (true)
>>>     left outer join tlid_smaller as ts on (sl.geo_streetname_id =
>>>          ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
>>>     where  str_name='alamo' and  city='san antonio' and state='TX'
>>> ;
>> That's a fairly odd query;
>
>
> I think it's a very common type of query in data warehousing.
>
> It's reasonably typical of a traditional star schema where
> "streetname_lookup" and "city_lookup" are dimension tables
> and "tlid_smaller" is the central fact table.

Although looking again I must admit the query was
written unconventionally.  Perhaps those queries are
remnants dating back to a version when you could
force join orders this way?

Perhaps a more common way of writing it would have been:

   select * from tlid_smaller
    where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='$str_name')
      and geo_city_id       in (select geo_city_id from city_lookup where city='$city' and state='$state');

However this query also fails to use the multi-column
index on (geo_streetname_id,geo_city_id).  Explain
analyze shown below.


In cases where I can be sure only one result will come
from each of the lookup queries I guess I can do this:

   select * from tlid_smaller
    where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='$str_name')
      and geo_city_id       = (select geo_city_id from city_lookup where city='$city' and state='$state');

which has the nicest plan of them all (explain analyze
also shown below).

 > With the tables I have (shown below), how else could one
 > efficiently fetch the data for "Main St" "San Francisco"?

I guess I just answered that question myself.  Where possible,
I'll write my queries this way.

      Thanks,
      Ron


fli=# fli=# explain analyze  select * from tlid_smaller
         where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='alamo')
           and geo_city_id       in (select geo_city_id from city_lookup where city='san antonio' and state='TX');
fli-# fli-#                                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash IN Join  (cost=9.03..29209.16 rows=1 width=32) (actual time=76.576..96.605 rows=78 loops=1)
    Hash Cond: ("outer".geo_city_id = "inner".geo_city_id)
    ->  Nested Loop  (cost=3.02..29202.88 rows=52 width=32) (actual time=65.877..91.789 rows=4151 loops=1)
          ->  HashAggregate  (cost=3.02..3.02 rows=1 width=4) (actual time=0.039..0.042 rows=1 loops=1)
                ->  Index Scan using streetname_lookup__str_name on streetname_lookup  (cost=0.00..3.01 rows=1 width=4)
(actualtime=0.025..0.028 rows=1 loops=1) 
                      Index Cond: (str_name = 'alamo'::text)
          ->  Index Scan using tlid_smaller__street_zipint on tlid_smaller  (cost=0.00..28994.70 rows=16413 width=32)
(actualtime=65.820..81.309 rows=4151 loops=1) 
                Index Cond: (tlid_smaller.geo_streetname_id = "outer".geo_streetname_id)
    ->  Hash  (cost=6.01..6.01 rows=1 width=4) (actual time=0.054..0.054 rows=0 loops=1)
          ->  Index Scan using city_lookup__name on city_lookup  (cost=0.00..6.01 rows=1 width=4) (actual
time=0.039..0.041rows=1 loops=1) 
                Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text))
  Total runtime: 97.577 ms
(12 rows)

fli=#

fli=# explain analyze  select * from tlid_smaller
         where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='alamo')
           and geo_city_id       = (select geo_city_id from city_lookup where city='san antonio' and state='TX');

fli-# fli-#                                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using tlid_smaller__street_city on tlid_smaller  (cost=9.02..16.88 rows=3 width=32) (actual
time=0.115..0.255rows=78 loops=1) 
    Index Cond: ((geo_streetname_id = $0) AND (geo_city_id = $1))
    InitPlan
      ->  Index Scan using streetname_lookup__str_name on streetname_lookup  (cost=0.00..3.01 rows=1 width=4) (actual
time=0.044..0.047rows=1 loops=1) 
            Index Cond: (str_name = 'alamo'::text)
      ->  Index Scan using city_lookup__name on city_lookup  (cost=0.00..6.01 rows=1 width=4) (actual time=0.028..0.030
rows=1loops=1) 
            Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text))
  Total runtime: 0.474 ms
(8 rows)

pgsql-performance by date:

Previous
From: Ron Mayer
Date:
Subject: Re: Left Outer Join much faster than non-outer Join?
Next
From: "Patrick Vedrines"
Date:
Subject: Re: Left Outer Join much faster than non-outer Join?