Re: Left Outer Join much faster than non-outer Join? - Mailing list pgsql-performance
From | Ron Mayer |
---|---|
Subject | Re: Left Outer Join much faster than non-outer Join? |
Date | |
Msg-id | 424BB1BB.90304@cheapcomplexdevices.com Whole thread Raw |
In response to | Re: Left Outer Join much faster than non-outer Join? (Ron Mayer <rm_pg@cheapcomplexdevices.com>) |
Responses |
Re: Left Outer Join much faster than non-outer Join?
|
List | pgsql-performance |
Ron Mayer wrote: > Tom Lane wrote: >> rm_pg@cheapcomplexdevices.com writes: >>> select * >>> from streetname_lookup as sl >>> join city_lookup as cl on (true) >>> left outer join tlid_smaller as ts on (sl.geo_streetname_id = >>> ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id) >>> where str_name='alamo' and city='san antonio' and state='TX' >>> ; >> That's a fairly odd query; > > > I think it's a very common type of query in data warehousing. > > It's reasonably typical of a traditional star schema where > "streetname_lookup" and "city_lookup" are dimension tables > and "tlid_smaller" is the central fact table. Although looking again I must admit the query was written unconventionally. Perhaps those queries are remnants dating back to a version when you could force join orders this way? Perhaps a more common way of writing it would have been: select * from tlid_smaller where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id in (select geo_city_id from city_lookup where city='$city' and state='$state'); However this query also fails to use the multi-column index on (geo_streetname_id,geo_city_id). Explain analyze shown below. In cases where I can be sure only one result will come from each of the lookup queries I guess I can do this: select * from tlid_smaller where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='$str_name') and geo_city_id = (select geo_city_id from city_lookup where city='$city' and state='$state'); which has the nicest plan of them all (explain analyze also shown below). > With the tables I have (shown below), how else could one > efficiently fetch the data for "Main St" "San Francisco"? I guess I just answered that question myself. Where possible, I'll write my queries this way. Thanks, Ron fli=# fli=# explain analyze select * from tlid_smaller where geo_streetname_id in (select geo_streetname_id from streetname_lookup where str_name='alamo') and geo_city_id in (select geo_city_id from city_lookup where city='san antonio' and state='TX'); fli-# fli-# QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash IN Join (cost=9.03..29209.16 rows=1 width=32) (actual time=76.576..96.605 rows=78 loops=1) Hash Cond: ("outer".geo_city_id = "inner".geo_city_id) -> Nested Loop (cost=3.02..29202.88 rows=52 width=32) (actual time=65.877..91.789 rows=4151 loops=1) -> HashAggregate (cost=3.02..3.02 rows=1 width=4) (actual time=0.039..0.042 rows=1 loops=1) -> Index Scan using streetname_lookup__str_name on streetname_lookup (cost=0.00..3.01 rows=1 width=4) (actualtime=0.025..0.028 rows=1 loops=1) Index Cond: (str_name = 'alamo'::text) -> Index Scan using tlid_smaller__street_zipint on tlid_smaller (cost=0.00..28994.70 rows=16413 width=32) (actualtime=65.820..81.309 rows=4151 loops=1) Index Cond: (tlid_smaller.geo_streetname_id = "outer".geo_streetname_id) -> Hash (cost=6.01..6.01 rows=1 width=4) (actual time=0.054..0.054 rows=0 loops=1) -> Index Scan using city_lookup__name on city_lookup (cost=0.00..6.01 rows=1 width=4) (actual time=0.039..0.041rows=1 loops=1) Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text)) Total runtime: 97.577 ms (12 rows) fli=# fli=# explain analyze select * from tlid_smaller where geo_streetname_id = (select geo_streetname_id from streetname_lookup where str_name='alamo') and geo_city_id = (select geo_city_id from city_lookup where city='san antonio' and state='TX'); fli-# fli-# QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using tlid_smaller__street_city on tlid_smaller (cost=9.02..16.88 rows=3 width=32) (actual time=0.115..0.255rows=78 loops=1) Index Cond: ((geo_streetname_id = $0) AND (geo_city_id = $1)) InitPlan -> Index Scan using streetname_lookup__str_name on streetname_lookup (cost=0.00..3.01 rows=1 width=4) (actual time=0.044..0.047rows=1 loops=1) Index Cond: (str_name = 'alamo'::text) -> Index Scan using city_lookup__name on city_lookup (cost=0.00..6.01 rows=1 width=4) (actual time=0.028..0.030 rows=1loops=1) Index Cond: ((city = 'san antonio'::text) AND (state = 'TX'::text)) Total runtime: 0.474 ms (8 rows)
pgsql-performance by date: