Thread: performance hit when joining with a view?

performance hit when joining with a view?

From
Palle Girgensohn
Date:
Hi!

A performance question:

I have some tables:

                   Tabell "public.person"
      Kolumn      |           Typ            |  Modifierare
------------------+--------------------------+---------------
 userid           | text                     | not null
 giver            | text                     |
 first_name       | text                     |
 last_name        | text                     |
 email            | text                     |
 default_language | text                     | default 'sv'
 created          | timestamp with time zone | default now()
 created_by       | text                     |
Index: person_pkey primärnyckel btree (userid),
       person_index unik btree (userid),
       person_giver_idx btree (giver)
Främmande nyckel-villkor: pp_fk9 FOREIGN KEY (giver) REFERENCES
providers(giver) ON UPDATE CASCADE ON DELETE CASCADE,
                          pp_fk2 FOREIGN KEY (created_by) REFERENCES
person(userid) ON UPDATE CASCADE ON DELETE SET NULL


                     Tabell "public.wiol"
     Kolumn      |             Typ             |  Modifierare
-----------------+-----------------------------+---------------
 userid          | text                        | not null
 course_id       | integer                     |
 login_ts        | timestamp without time zone | default now()
 latest_event_ts | timestamp without time zone | default now()
Främmande nyckel-villkor: pp_fk2 FOREIGN KEY (course_id) REFERENCES
course(id) ON UPDATE CASCADE ON DELETE CASCADE,
                          pp_fk1 FOREIGN KEY (userid) REFERENCES
person(userid) ON UPDATE CASCADE ON DELETE CASCADE

and a view:

               Vy "public.person_wiol_view"
      Kolumn      |           Typ            | Modifierare
------------------+--------------------------+-------------
 userid           | text                     |
 giver            | text                     |
 first_name       | text                     |
 last_name        | text                     |
 email            | text                     |
 default_language | text                     |
 created          | timestamp with time zone |
 created_by       | text                     |
 course_id        | integer                  |
Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email,
p.default_language, p.created, p.created_by, w.course_id FROM (person p
LEFT JOIN wiol w ON ((p.userid = w.userid)));


Now, with about 30000 tuples in person and about 40 in wiol, executing a
left outer join with the view gives horrible performance:

 explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
p.type, case when sender.userid is not null then sender.first_name || ' '
|| sender.last_name else null end as sender_name, sender.course_id is not
null as is_online from pim p left outer join person_wiol_view sender on
(sender.userid = p.sender) where p.recipient = 'axto6551' and p.type >= 0
limit 1;
                                                             QUERY PLAN
---------------------------------------------------------------------------
----------------------------------------------------------
 Limit  (cost=0.00..1331.26 rows=1 width=180) (actual time=866.14..1135.65
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..1331.26 rows=1 width=180) (actual
time=866.13..1135.63 rows=2 loops=1)
         Join Filter: ("inner".userid = "outer".sender)
         ->  Seq Scan on pim p  (cost=0.00..0.00 rows=1 width=112) (actual
time=0.05..0.18 rows=2 loops=1)
               Filter: ((recipient = 'axto6551'::text) AND ("type" >= 0))
         ->  Materialize  (cost=956.15..956.15 rows=30009 width=68) (actual
time=369.33..437.86 rows=22045 loops=2)
               ->  Hash Join  (cost=0.00..956.15 rows=30009 width=68)
(actual time=0.45..605.21 rows=30013 loops=1)
                     Hash Cond: ("outer".userid = "inner".userid)
                     ->  Seq Scan on person p  (cost=0.00..806.09
rows=30009 width=32) (actual time=0.16..279.28 rows=30009 loops=1)
                     ->  Hash  (cost=0.00..0.00 rows=1 width=36) (actual
time=0.13..0.13 rows=0 loops=1)
                           ->  Seq Scan on wiol w  (cost=0.00..0.00 rows=1
width=36) (actual time=0.02..0.09 rows=8 loops=1)
 Total runtime: 1143.93 msec
(12 rader)


but rewriting the question with an explicit join uses the indices, and runs
*much* faster:

explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
p.type, case when sender.userid is not null then sender.first_name || ' '
|| sender.last_name else null end as sender_name, w.course_id is not null
as is_online from pim p left outer join person sender on (sender.userid =
p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient =
'axto6551' and p.type >= 0  limit 1;
                                                                 QUERY PLAN
---------------------------------------------------------------------------
-----------------------------------------------------------------
 Limit  (cost=0.00..6.03 rows=1 width=180) (actual time=0.89..1.13 rows=1
loops=1)
   ->  Hash Join  (cost=0.00..6.03 rows=1 width=180) (actual
time=0.88..1.12 rows=2 loops=1)
         Hash Cond: ("outer".userid = "inner".userid)
         ->  Nested Loop  (cost=0.00..6.02 rows=1 width=144) (actual
time=0.48..0.69 rows=2 loops=1)
               ->  Seq Scan on pim p  (cost=0.00..0.00 rows=1 width=112)
(actual time=0.04..0.16 rows=2 loops=1)
                     Filter: ((recipient = 'axto6551'::text) AND ("type" >=
0))
               ->  Index Scan using person_pkey on person sender
(cost=0.00..6.01 rows=1 width=32) (actual time=0.23..0.24 rows=1 loops=2)
                     Index Cond: (sender.userid = "outer".sender)
         ->  Hash  (cost=0.00..0.00 rows=1 width=36) (actual
time=0.22..0.22 rows=0 loops=1)
               ->  Seq Scan on wiol w  (cost=0.00..0.00 rows=1 width=36)
(actual time=0.12..0.17 rows=8 loops=1)
 Total runtime: 1.39 msec
(11 rader)



Tests run on postgresql-7.3.4.

Main question is, is it bad SQL to join with a view, or is it postgresql
that does something not quite optimal? If the latter, is it fixed in 7.4?

Thanks,
Palle




Re: performance hit when joining with a view?

From
Tom Lane
Date:
Palle Girgensohn <girgen@pingpong.net> writes:
> Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email,
> p.default_language, p.created, p.created_by, w.course_id FROM (person p
> LEFT JOIN wiol w ON ((p.userid = w.userid)));

>  explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
> p.type, case when sender.userid is not null then sender.first_name || ' '
> || sender.last_name else null end as sender_name, sender.course_id is not
> null as is_online from pim p left outer join person_wiol_view sender on
> (sender.userid = p.sender) where p.recipient = 'axto6551' and p.type >= 0
> limit 1;

> explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
> p.type, case when sender.userid is not null then sender.first_name || ' '
> || sender.last_name else null end as sender_name, w.course_id is not null
> as is_online from pim p left outer join person sender on (sender.userid =
> p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient =
> 'axto6551' and p.type >= 0  limit 1;

These are not actually the same query.  In the former case the implicit
parenthesization of the joins is
    pim left join (person left join wiol)
whereas in the latter case the implicit parenthesization is left-to-right:
    (pim left join person) left join wiol
Since the only restriction conditions you have provided are on pim, the
first parenthesization implies forming the entire join of person and
wiol :-(.

If you were using plain joins then the two queries would be logically
equivalent, but outer joins are in general not associative, so the
planner will not consider re-ordering them.

There is some work in 7.4 to make the planner smarter about outer joins,
but offhand I don't think any of it will improve results for this
particular example.

I have seen some academic papers about how to prove that a particular
pair of outer join operators can safely be swapped (as I think is true
in this example).  Some knowledge of that sort may eventually get into
the planner, but it ain't there now.

            regards, tom lane