I have thee primary tables: comments, events and users. There's a
join table events_comments tying comments to events. There are B-tree
indexes on every column involved, and the tables are freshly vacuumed
and analyzed;
A simple select to retrieve all comments for a given event, with an
outer join to retrieve creators:
select comments.*, users.*
from comments
left outer join users on
users.id = comments.creator_id
inner join events_comments on
comments.id = events_comments.comment_id
and events_comments.event_id = 9244
...uses the following execution plan:
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
Hash Join (cost=1138.11..29763.30 rows=9 width=805) (actual
time=1002.348..1493.016 rows=3 loops=1)
Hash Cond: ("outer".id = "inner".comment_id)
-> Hash Left Join (cost=1119.99..28858.50 rows=177318
width=805) (actual time=28.919..1440.155 rows=177448 loops=1)
Hash Cond: ("outer".creator_id = "inner".id)
-> Seq Scan on comments (cost=0.00..9230.18 rows=177318
width=325) (actual time=0.005..117.746 rows=177448 loops=1)
-> Hash (cost=531.19..531.19 rows=9119 width=480) (actual
time=28.883..28.883 rows=9119 loops=1)
-> Seq Scan on users (cost=0.00..531.19 rows=9119
width=480) (actual time=0.003..6.555 rows=9119 loops=1)
-> Hash (cost=18.10..18.10 rows=9 width=4) (actual
time=0.050..0.050 rows=3 loops=1)
-> Index Scan using events_comments_event_id_index on
events_comments (cost=0.00..18.10 rows=9 width=4) (actual
time=0.028..0.038 rows=3 loops=1)
Index Cond: (event_id = 9244)
Total runtime: 1493.565 ms
In this case there are three comments. Every comment has a creator
user here, so an inner join will produce the same results, but the
execution plan is considerably more efficient:
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
Nested Loop (cost=0.00..104.52 rows=8 width=805) (actual
time=0.059..0.117 rows=3 loops=1)
-> Nested Loop (cost=0.00..50.75 rows=9 width=325) (actual
time=0.045..0.078 rows=3 loops=1)
-> Index Scan using events_comments_event_id_index on
events_comments (cost=0.00..18.10 rows=9 width=4) (actual
time=0.026..0.033 rows=3 loops=1)
Index Cond: (event_id = 9244)
-> Index Scan using comments_pkey on comments
(cost=0.00..3.62 rows=1 width=325) (actual time=0.011..0.012 rows=1
loops=3)
Index Cond: (comments.id = "outer".comment_id)
-> Index Scan using users_pkey on users (cost=0.00..5.96 rows=1
width=480) (actual time=0.009..0.010 rows=1 loops=3)
Index Cond: (users.id = "outer".creator_id)
Total runtime: 0.354 ms
The outer join seems unnecessarily slow. Is there anything I can do
to speed it up?
PostgreSQL 8.1.3 on OS X (MacPorts).
Alexander.