Thread: JOIN to a VIEW makes a real slow query

JOIN to a VIEW makes a real slow query

From
"Chuck D."
Date:
Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to
check here first.  I've seen this discussed on the list before but I'm still
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,'
and 'country' tables.  The city data comes from multiple sources and totals
about 3 million rows.  I decided to split the city table up based on the
source (world_city, us_city).  This makes easier updating because the
assigned feature id's from the two sources overlap in some cases making it
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About
65 seconds to select a city.  It doesn't matter wether it is joined to one
table or 6 like it is in my user_detail query - it is still slow.  It has
indexes on the city_id, state_id, country_id of each table in the view too.
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=650146.58..751018.45 rows=10618 width=55)
(actual time=53078.261..61269.190 rows=1 loops=1)
   Join Filter: ("outer".city_id = "inner"."?column1?")
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.022 rows=1 loops=1)
   ->  Unique  (cost=650146.58..703236.51 rows=2123597 width=62) (actual
time=49458.007..59635.140 rows=2122712 loops=1)
         ->  Sort  (cost=650146.58..655455.58 rows=2123597 width=62) (actual
time=49458.003..55405.965 rows=2122712 loops=1)
               Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat,
lon, city_name
               ->  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual
time=18.835..13706.395 rows=2122712 loops=1)
                     ->  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
                     ->  Seq Scan on world_city  (cost=0.00..47632.88
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
 Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=36) (actual
time=53.854..53.871 rows=1 loops=1)
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.016 rows=1 loops=1)
   ->  Index Scan using world_city_pk on world_city ci  (cost=0.00..3.01
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
         Index Cond: ("outer".city_id = ci.city_id)
 Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms


I'm not sure that a view on a UNION is the best idea but I don't know how to
go about keeping the tables from the data sources with the view (other than
modifying them with a source_id column).  Any ideas on what is causing the
performance lag?



Re: JOIN to a VIEW makes a real slow query

From
"Merlin Moncure"
Date:
On 2/13/07, Chuck D. <pgsql-performance@nullmx.com> wrote:
> Hi folks,
>
> I don't know if this is an SQL or PERFORMANCE list problem but I wanted to
> check here first.  I've seen this discussed on the list before but I'm still
> not sure of the solution.  Maybe my query is just structured wrong.
>
> I recently visited an old project of mine that has a 'city', 'state,'
> and 'country' tables.  The city data comes from multiple sources and totals
> about 3 million rows.  I decided to split the city table up based on the
> source (world_city, us_city).  This makes easier updating because the
> assigned feature id's from the two sources overlap in some cases making it
> impossible to update as a single merged table.
>
> However, I decided to create a view to behave like the old 'city' table.  The
> view is just a simple:
>
> SELECT [columns]
> FROM world_city
> UNION
> SELECT [columns]
> FROM us_city
> ;
>
> Selecting from the view is very quick, but JOINing to the view is slow.  About
> 65 seconds to select a city.  It doesn't matter wether it is joined to one
> table or 6 like it is in my user_detail query - it is still slow.  It has
> indexes on the city_id, state_id, country_id of each table in the view too.
> Everything has been 'VACUUM ANALYZE' ed.

use 'union all' instead of union.  union without all has an implied
sort and duplicate removal step that has to be resolved, materializing
the view, before you can join to it.

merlin

Re: JOIN to a VIEW makes a real slow query

From
"Chuck D."
Date:
On Tuesday 13 February 2007 13:16, Merlin Moncure wrote:
>
> use 'union all' instead of union.  union without all has an implied
> sort and duplicate removal step that has to be resolved, materializing
> the view, before you can join to it.
>

Thanks for that Merlin, I forgot about using ALL.  That does eliminate the
UNIQUE, SORT and SORT lines from the EXPLAIN query.  It also brings the query
time down from a whopping 65 seconds to 11 seconds.  The two tables contain
unique rows already so ALL would be required.

It is still using that sequence scan on the view after the APPEND for the
us_city and world_city table.  Any reason why the view won't use the indexes
when it is JOINed to another table but it will when the view is queried
without a JOIN?  I should have mentioned this is v8.1.4.

Also, does anyone know why this line:
Join Filter: ("outer".city_id = "inner"."?column1?")
... contains "?column1?" instead of the actual column name?

This is the result after UNION ALL on the view

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..121523.88 rows=10618 width=55) (actual
time=2392.376..11061.117 rows=1 loops=1)
   Join Filter: ("outer".city_id = "inner"."?column1?")
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual
time=0.025..0.028 rows=1 loops=1)
   ->  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual
time=16.120..9644.315 rows=2122712 loops=1)
         ->  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 width=62)
(actual time=16.119..899.802 rows=169398 loops=1)
         ->  Seq Scan on world_city  (cost=0.00..47632.88 rows=1954188
width=61) (actual time=10.585..6949.946 rows=1953314 loops=1)
 Total runtime: 11061.441 ms
(7 rows)


Re: JOIN to a VIEW makes a real slow query

From
Tom Lane
Date:
"Chuck D." <pgsql-performance@nullmx.com> writes:
> It is still using that sequence scan on the view after the APPEND for the
> us_city and world_city table.  Any reason why the view won't use the indexes
> when it is JOINed to another table but it will when the view is queried
> without a JOIN?  I should have mentioned this is v8.1.4.

8.1 isn't bright enough for that.  Should work in 8.2 though.

> Also, does anyone know why this line:
> Join Filter: ("outer".city_id = "inner"."?column1?")
> ... contains "?column1?" instead of the actual column name?

EXPLAIN can't conveniently get access to the column name.  That could
probably be improved if someone wanted to put enough effort into it.

            regards, tom lane

Re: JOIN to a VIEW makes a real slow query

From
"Chuck D."
Date:
On Tuesday 13 February 2007 14:51, Tom Lane wrote:
> "Chuck D." <pgsql-performance@nullmx.com> writes:
> > It is still using that sequence scan on the view after the APPEND for the
> > us_city and world_city table.  Any reason why the view won't use the
> > indexes when it is JOINed to another table but it will when the view is
> > queried without a JOIN?  I should have mentioned this is v8.1.4.
>
> 8.1 isn't bright enough for that.  Should work in 8.2 though.

>
>             regards, tom lane

Upgraded to 8.2.3 in my spare time here - went from the packaged binary that
came with Ubuntu to compiling from source.  Haven't tuned it yet, but what do
you think about this join on the view?


cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..17.76 rows=10614 width=486) (actual
time=0.109..0.113 rows=1 loops=1)
   Join Filter: (mu.city_id = ci.city_id)
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=72) (actual
time=0.015..0.017 rows=1 loops=1)
   ->  Append  (cost=0.00..16.72 rows=2 width=422) (actual time=0.073..0.075
rows=1 loops=1)
         ->  Index Scan using pk_us_city on us_city  (cost=0.00..8.28 rows=1
width=222) (actual time=0.032..0.032 rows=0 loops=1)
               Index Cond: (mu.city_id = us_city.city_id)
         ->  Index Scan using world_city_pk on world_city  (cost=0.00..8.44
rows=1 width=422) (actual time=0.040..0.042 rows=1 loops=1)
               Index Cond: (mu.city_id = world_city.city_id)
 Total runtime: 0.359 ms
(9 rows)


From 65 seconds down to less than 1 ms.  Pretty good huh?  Nice call Tom.

Now I'll have to find some time to do the production server before this app
goes up.