performance hit when joining with a view? - Mailing list pgsql-general

From Palle Girgensohn
Subject performance hit when joining with a view?
Date
Msg-id 24470000.1064492116@rambutan.pingpong.net
Whole thread Raw
List pgsql-general
Hi!

A performance question:

I have some tables:

                   Tabell "public.person"
      Kolumn      |           Typ            |  Modifierare
------------------+--------------------------+---------------
 userid           | text                     | not null
 giver            | text                     |
 first_name       | text                     |
 last_name        | text                     |
 email            | text                     |
 default_language | text                     | default 'sv'
 created          | timestamp with time zone | default now()
 created_by       | text                     |
Index: person_pkey primärnyckel btree (userid),
       person_index unik btree (userid),
       person_giver_idx btree (giver)
Främmande nyckel-villkor: pp_fk9 FOREIGN KEY (giver) REFERENCES
providers(giver) ON UPDATE CASCADE ON DELETE CASCADE,
                          pp_fk2 FOREIGN KEY (created_by) REFERENCES
person(userid) ON UPDATE CASCADE ON DELETE SET NULL


                     Tabell "public.wiol"
     Kolumn      |             Typ             |  Modifierare
-----------------+-----------------------------+---------------
 userid          | text                        | not null
 course_id       | integer                     |
 login_ts        | timestamp without time zone | default now()
 latest_event_ts | timestamp without time zone | default now()
Främmande nyckel-villkor: pp_fk2 FOREIGN KEY (course_id) REFERENCES
course(id) ON UPDATE CASCADE ON DELETE CASCADE,
                          pp_fk1 FOREIGN KEY (userid) REFERENCES
person(userid) ON UPDATE CASCADE ON DELETE CASCADE

and a view:

               Vy "public.person_wiol_view"
      Kolumn      |           Typ            | Modifierare
------------------+--------------------------+-------------
 userid           | text                     |
 giver            | text                     |
 first_name       | text                     |
 last_name        | text                     |
 email            | text                     |
 default_language | text                     |
 created          | timestamp with time zone |
 created_by       | text                     |
 course_id        | integer                  |
Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email,
p.default_language, p.created, p.created_by, w.course_id FROM (person p
LEFT JOIN wiol w ON ((p.userid = w.userid)));


Now, with about 30000 tuples in person and about 40 in wiol, executing a
left outer join with the view gives horrible performance:

 explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
p.type, case when sender.userid is not null then sender.first_name || ' '
|| sender.last_name else null end as sender_name, sender.course_id is not
null as is_online from pim p left outer join person_wiol_view sender on
(sender.userid = p.sender) where p.recipient = 'axto6551' and p.type >= 0
limit 1;
                                                             QUERY PLAN

---------------------------------------------------------------------------
----------------------------------------------------------
 Limit  (cost=0.00..1331.26 rows=1 width=180) (actual time=866.14..1135.65
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..1331.26 rows=1 width=180) (actual
time=866.13..1135.63 rows=2 loops=1)
         Join Filter: ("inner".userid = "outer".sender)
         ->  Seq Scan on pim p  (cost=0.00..0.00 rows=1 width=112) (actual
time=0.05..0.18 rows=2 loops=1)
               Filter: ((recipient = 'axto6551'::text) AND ("type" >= 0))
         ->  Materialize  (cost=956.15..956.15 rows=30009 width=68) (actual
time=369.33..437.86 rows=22045 loops=2)
               ->  Hash Join  (cost=0.00..956.15 rows=30009 width=68)
(actual time=0.45..605.21 rows=30013 loops=1)
                     Hash Cond: ("outer".userid = "inner".userid)
                     ->  Seq Scan on person p  (cost=0.00..806.09
rows=30009 width=32) (actual time=0.16..279.28 rows=30009 loops=1)
                     ->  Hash  (cost=0.00..0.00 rows=1 width=36) (actual
time=0.13..0.13 rows=0 loops=1)
                           ->  Seq Scan on wiol w  (cost=0.00..0.00 rows=1
width=36) (actual time=0.02..0.09 rows=8 loops=1)
 Total runtime: 1143.93 msec
(12 rader)


but rewriting the question with an explicit join uses the indices, and runs
*much* faster:

explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts,
p.type, case when sender.userid is not null then sender.first_name || ' '
|| sender.last_name else null end as sender_name, w.course_id is not null
as is_online from pim p left outer join person sender on (sender.userid =
p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient =
'axto6551' and p.type >= 0  limit 1;
                                                                 QUERY PLAN

---------------------------------------------------------------------------
-----------------------------------------------------------------
 Limit  (cost=0.00..6.03 rows=1 width=180) (actual time=0.89..1.13 rows=1
loops=1)
   ->  Hash Join  (cost=0.00..6.03 rows=1 width=180) (actual
time=0.88..1.12 rows=2 loops=1)
         Hash Cond: ("outer".userid = "inner".userid)
         ->  Nested Loop  (cost=0.00..6.02 rows=1 width=144) (actual
time=0.48..0.69 rows=2 loops=1)
               ->  Seq Scan on pim p  (cost=0.00..0.00 rows=1 width=112)
(actual time=0.04..0.16 rows=2 loops=1)
                     Filter: ((recipient = 'axto6551'::text) AND ("type" >=
0))
               ->  Index Scan using person_pkey on person sender
(cost=0.00..6.01 rows=1 width=32) (actual time=0.23..0.24 rows=1 loops=2)
                     Index Cond: (sender.userid = "outer".sender)
         ->  Hash  (cost=0.00..0.00 rows=1 width=36) (actual
time=0.22..0.22 rows=0 loops=1)
               ->  Seq Scan on wiol w  (cost=0.00..0.00 rows=1 width=36)
(actual time=0.12..0.17 rows=8 loops=1)
 Total runtime: 1.39 msec
(11 rader)



Tests run on postgresql-7.3.4.

Main question is, is it bad SQL to join with a view, or is it postgresql
that does something not quite optimal? If the latter, is it fixed in 7.4?

Thanks,
Palle


pgsql-general by date:

Previous
From: "Dean Arnold"
Date:
Subject: Re: PostgreSQL at OSCON 2004
Next
From: Curtis Stanford
Date:
Subject: Good way to insert/update when you're not sure of duplicates?