simple left join slows query more than expected - Mailing list pgsql-performance

From Jonathan Knopp
Subject simple left join slows query more than expected
Date
Msg-id 3FC6E405.5010509@delegated.net
Whole thread Raw
Responses Re: simple left join slows query more than expected  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I'v spent a couple days playing with this problem and searching the mailing lists and
docs etc but come up with nothing. Any help would be much appreciated.

Setup is postgres 7.3.2 on redhat 7.1 on a 1.3GHz Athlon machine with 1G pc133 ram and
SCSI.

Here is the same query with the addition of a left join onto a list of contacts to
grab the last name of each connected contact. I'd think this should be real quick since
it jsut has to grab around 100 names from the list, and if its smart enough to grab just
distinct IDs, then it's just like 10 rows it has to grab using the primary field. But as
far as i can tell (and i may VERY well be reading the explain syntax wrong), it is
grabbing them all and joining them first, rather than doing the operation that limits
the result rows to a mere 100 and THEN doing the join to contacts. It would be faster
if i did a separate query using a big IN(id1,id2,...) condition, which makes no sense to
me. Plus i REALLY want to avoid this as the selected fields and the joins and conditions
are all variable and controlled (indirectly and transparently) by the user.

Point is, why does a simple left join slow things down so much? in my experience
(primarily with mysql but also over a year with postgre) simple left joins are usually
quite quick. I can only guess that a bad plan is being chosen. PLEASE don't tell me i
need to store a copy of the names in the events table to get acceptable speed, cause
this would be plain sacrilegious in terms of DB design. Or is this simply as fast as
these queries can go? Just seems too long for the work that's being done IME.

events table has 12355 rows
contacts has 20064
event_managers has 8502

All fields with conditions (object_ids, contact, event_id, user_id, deleted_on) are indexed with btree.

Here is the query with the left join.

sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, top.subject, top.action,
top_contact_.last_name,top.object_id, top_contact_.object_id 
            FROM event_managers AS managers
                JOIN ONLY events AS top ON(managers.event_id=top.object_id)
                LEFT JOIN contacts AS top_contact_ ON(top.contact=top_contact_.object_id and top_contact_.deleted_on IS
NULL)
            WHERE true AND managers.user_id=238;
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5569.24..5671.22 rows=100 width=91) (actual time=485.95..526.25 rows=208 loops=1)
   Merge Cond: ("outer".contact = "inner".object_id)
   Join Filter: ("inner".deleted_on IS NULL)
   ->  Sort  (cost=2467.17..2467.42 rows=100 width=60) (actual time=143.67..143.75 rows=208 loops=1)
         Sort Key: top.contact
         ->  Hash Join  (cost=143.63..2463.83 rows=100 width=60) (actual time=0.89..142.64 rows=208 loops=1)
               Hash Cond: ("outer".object_id = "inner".event_id)
               ->  Seq Scan on events top  (cost=0.00..1830.19 rows=12219 width=56) (actual time=0.05..131.33
rows=12219loops=1) 
               ->  Hash  (cost=143.45..143.45 rows=69 width=4) (actual time=0.65..0.65 rows=0 loops=1)
                     ->  Index Scan using event_managers_user_id on event_managers managers  (cost=0.00..143.45 rows=69
width=4)(actual time=0.14..0.50 rows=139 loops=1) 
                           Index Cond: (user_id = 238)
   ->  Sort  (cost=3102.07..3152.23 rows=20064 width=31) (actual time=342.23..360.29 rows=19964 loops=1)
         Sort Key: top_contact_.object_id
         ->  Append  (cost=0.00..1389.64 rows=20064 width=31) (actual time=0.06..115.63 rows=20064 loops=1)
               ->  Seq Scan on contacts top_contact_  (cost=0.00..1383.43 rows=20043 width=31) (actual
time=0.06..101.04rows=20043 loops=1) 
               ->  Seq Scan on users top_contact_  (cost=0.00..6.21 rows=21 width=31) (actual time=0.05..0.29 rows=21
loops=1)
 Total runtime: 527.47 msec
(17 rows)


The same thing but without the left join. Much faster. Anything slower than
this would be unacceptable, especailly given how small the tables are at this
point. They are expected to grow ALOT bigger within a year.

sauce=# explain analyze SELECT top.object_id , top.who, top.datetime, top.priority, top.subject, top.action,
top.object_id
            FROM event_managers AS managers
                JOIN ONLY events AS top ON(managers.event_id=top.object_id)
            WHERE true AND managers.user_id=238;
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=143.63..2463.83 rows=100 width=56) (actual time=1.48..137.74 rows=208 loops=1)
   Hash Cond: ("outer".object_id = "inner".event_id)
   ->  Seq Scan on events top  (cost=0.00..1830.19 rows=12219 width=52) (actual time=0.06..125.80 rows=12219 loops=1)
   ->  Hash  (cost=143.45..143.45 rows=69 width=4) (actual time=1.20..1.20 rows=0 loops=1)
         ->  Index Scan using event_managers_user_id on event_managers managers  (cost=0.00..143.45 rows=69 width=4)
(actualtime=0.21..1.03 rows=139 loops=1) 
               Index Cond: (user_id = 238)
 Total runtime: 137.96 msec
(7 rows)

again, many thanks for any feedback!


pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: For full text indexing, which is better, tsearch2 or
Next
From: Tom Lane
Date:
Subject: Re: simple left join slows query more than expected