Re: Planner matching constants across tables in a join - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Planner matching constants across tables in a join
Date
Msg-id 200303051424.12351.dev@archonet.com
Whole thread Raw
In response to Re: Planner matching constants across tables in a join  (Greg Stark <gsstark@mit.edu>)
Responses Re: Planner matching constants across tables in a join
List pgsql-performance
On Wednesday 05 Mar 2003 12:42 pm, Greg Stark wrote:
> Really? They produce virtually the same plan for me.
>
> Why do you think it'll run slower?
> What query are you actually finding slow?

The actual query uses three tables, but isn't very complicated. Apologies for
the wrapping on the explain.

EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id,
i.prod_id, i.chg_per_min, i.rev_per_min
FROM campaign_items i, campaign c, activity a
WHERE
i.cam_id=c.id AND a.line_id=i.line_id
AND a.start_time BETWEEN c.cam_from AND c.cam_to
AND a.line_id='0912345 0004' AND i.line_id='0912345 0004';


QUERY PLAN
----------
 Merge Join  (cost=348.01..348.72 rows=1 width=72) (actual time=115.43..116.27
rows=21 loops=1)
   Merge Cond: ("outer".id = "inner".cam_id)
   Join Filter: (("outer".line_id)::text = ("inner".line_id)::text)
   ->  Sort  (cost=245.45..245.75 rows=118 width=40) (actual time=83.98..84.10
rows=94 loops=1)
         Sort Key: c.id
         ->  Nested Loop  (cost=0.00..241.40 rows=118 width=40) (actual
time=3.83..83.27 rows=94 loops=1)
               Join Filter: (("outer".start_time >=
("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <=
("inner".cam_to)::timestamp without time zone))
               ->  Seq Scan on activity a  (cost=0.00..199.00 rows=11
width=28) (actual time=3.06..54.14 rows=19 loops=1)
                     Filter: ((line_id)::text = '0912345 0004'::text)
               ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.02..0.84 rows=100 loops=19)
   ->  Sort  (cost=102.56..102.57 rows=5 width=32) (actual time=31.36..31.39
rows=20 loops=1)
         Sort Key: i.cam_id
         ->  Seq Scan on campaign_items i  (cost=0.00..102.50 rows=5 width=32)
(actual time=17.16..31.11 rows=6 loops=1)
               Filter: ((line_id)::text = '0912345 0004'::text)
 Total runtime: 117.08 msec
(15 rows)


and this is the plan where I just check the one line_id:


EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id,
i.prod_id, i.chg_per_min, i.rev_per_min
FROM campaign_items i, campaign c, activity a
WHERE
i.cam_id=c.id AND a.line_id=i.line_id
AND a.start_time BETWEEN c.cam_from AND c.cam_to
AND i.line_id='0912345 0004';

QUERY PLAN
---------------------------------------
 Hash Join  (cost=2.25..1623.70 rows=6 width=72) (actual time=48.27..974.30
rows=21 loops=1)
   Hash Cond: ("outer".cam_id = "inner".id)
   Join Filter: (("outer".start_time >= ("inner".cam_from)::timestamp without
time zone) AND ("outer".start_time <= ("inner".cam_to)::timestamp without
time zone))
   ->  Nested Loop  (cost=0.00..1619.87 rows=53 width=60) (actual
time=24.49..969.33 rows=114 loops=1)
         Join Filter: (("inner".line_id)::text = ("outer".line_id)::text)
         ->  Seq Scan on campaign_items i  (cost=0.00..102.50 rows=5 width=32)
(actual time=15.72..28.52 rows=6 loops=1)
               Filter: ((line_id)::text = '0912345 0004'::text)
         ->  Seq Scan on activity a  (cost=0.00..174.00 rows=10000 width=28)
(actual time=0.03..101.95 rows=10000 loops=6)
   ->  Hash  (cost=2.00..2.00 rows=100 width=12) (actual time=1.54..1.54
rows=0 loops=1)
         ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.06..0.94 rows=100 loops=1)
 Total runtime: 975.13 msec
(11 rows)

Table campaign has 100 rows, campaign_items 5000, activity 10000. My guess is
that the planner starts with "campaign" because of the low number of rows,
but it still looks like filtering on "activity" would help things. Indeed,
testing a.line_id instead of i.line_id does make a difference.


QUERY PLAN
-------------------
 Hash Join  (cost=241.70..457.54 rows=6 width=72) (actual time=161.20..225.68
rows=21 loops=1)
   Hash Cond: ("outer".cam_id = "inner".id)
   Join Filter: (("inner".line_id)::text = ("outer".line_id)::text)
   ->  Seq Scan on campaign_items i  (cost=0.00..90.00 rows=5000 width=32)
(actual time=0.03..72.00 rows=5000 loops=1)
   ->  Hash  (cost=241.40..241.40 rows=118 width=40) (actual time=85.46..85.46
rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..241.40 rows=118 width=40) (actual
time=3.80..84.66 rows=94 loops=1)
               Join Filter: (("outer".start_time >=
("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <=
("inner".cam_to)::timestamp without time zone))
               ->  Seq Scan on activity a  (cost=0.00..199.00 rows=11
width=28) (actual time=3.03..54.48 rows=19 loops=1)
                     Filter: ((line_id)::text = '0912345 0004'::text)
               ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.03..0.89 rows=100 loops=19)
 Total runtime: 226.51 msec
(11 rows)

--
  Richard Huxton

pgsql-performance by date:

Previous
From: Neil Conway
Date:
Subject: Re: OIDs as keys
Next
From: Rod Taylor
Date:
Subject: Re: OIDs as keys