Thread: Planner matching constants across tables in a join
I know this has been covered on one of the lists in the past, but I'm damned if I can find the keywords to locate it. If I join two tables with a comparison to a constant on one, why can't the planner see that the comparison applies to both tables: SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1; runs much slower than SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1 AND b.id=1; It's not a real problem since it's easy to work around, but I was wondering what the difficulties are for the planner in seeing that query 1 is the same as query 2. Note that it doesn't seem related to JOIN forcing the planner's hand, the same applies just using WHERE a.id=b.id -- Richard Huxton
Richard Huxton <dev@archonet.com> writes: > I know this has been covered on one of the lists in the past, but I'm damned > if I can find the keywords to locate it. > > If I join two tables with a comparison to a constant on one, why can't the > planner see that the comparison applies to both tables: It sure does. Postgres does an impressive job of tracing equality clauses around for just this purpose. > SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1; > > runs much slower than > > SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1 AND b.id=1; Really? They produce virtually the same plan for me. Why do you think it'll run slower? What query are you actually finding slow? -- greg
On Wednesday 05 Mar 2003 12:42 pm, Greg Stark wrote: > Really? They produce virtually the same plan for me. > > Why do you think it'll run slower? > What query are you actually finding slow? The actual query uses three tables, but isn't very complicated. Apologies for the wrapping on the explain. EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id, i.prod_id, i.chg_per_min, i.rev_per_min FROM campaign_items i, campaign c, activity a WHERE i.cam_id=c.id AND a.line_id=i.line_id AND a.start_time BETWEEN c.cam_from AND c.cam_to AND a.line_id='0912345 0004' AND i.line_id='0912345 0004'; QUERY PLAN ---------- Merge Join (cost=348.01..348.72 rows=1 width=72) (actual time=115.43..116.27 rows=21 loops=1) Merge Cond: ("outer".id = "inner".cam_id) Join Filter: (("outer".line_id)::text = ("inner".line_id)::text) -> Sort (cost=245.45..245.75 rows=118 width=40) (actual time=83.98..84.10 rows=94 loops=1) Sort Key: c.id -> Nested Loop (cost=0.00..241.40 rows=118 width=40) (actual time=3.83..83.27 rows=94 loops=1) Join Filter: (("outer".start_time >= ("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <= ("inner".cam_to)::timestamp without time zone)) -> Seq Scan on activity a (cost=0.00..199.00 rows=11 width=28) (actual time=3.06..54.14 rows=19 loops=1) Filter: ((line_id)::text = '0912345 0004'::text) -> Seq Scan on campaign c (cost=0.00..2.00 rows=100 width=12) (actual time=0.02..0.84 rows=100 loops=19) -> Sort (cost=102.56..102.57 rows=5 width=32) (actual time=31.36..31.39 rows=20 loops=1) Sort Key: i.cam_id -> Seq Scan on campaign_items i (cost=0.00..102.50 rows=5 width=32) (actual time=17.16..31.11 rows=6 loops=1) Filter: ((line_id)::text = '0912345 0004'::text) Total runtime: 117.08 msec (15 rows) and this is the plan where I just check the one line_id: EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id, i.prod_id, i.chg_per_min, i.rev_per_min FROM campaign_items i, campaign c, activity a WHERE i.cam_id=c.id AND a.line_id=i.line_id AND a.start_time BETWEEN c.cam_from AND c.cam_to AND i.line_id='0912345 0004'; QUERY PLAN --------------------------------------- Hash Join (cost=2.25..1623.70 rows=6 width=72) (actual time=48.27..974.30 rows=21 loops=1) Hash Cond: ("outer".cam_id = "inner".id) Join Filter: (("outer".start_time >= ("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <= ("inner".cam_to)::timestamp without time zone)) -> Nested Loop (cost=0.00..1619.87 rows=53 width=60) (actual time=24.49..969.33 rows=114 loops=1) Join Filter: (("inner".line_id)::text = ("outer".line_id)::text) -> Seq Scan on campaign_items i (cost=0.00..102.50 rows=5 width=32) (actual time=15.72..28.52 rows=6 loops=1) Filter: ((line_id)::text = '0912345 0004'::text) -> Seq Scan on activity a (cost=0.00..174.00 rows=10000 width=28) (actual time=0.03..101.95 rows=10000 loops=6) -> Hash (cost=2.00..2.00 rows=100 width=12) (actual time=1.54..1.54 rows=0 loops=1) -> Seq Scan on campaign c (cost=0.00..2.00 rows=100 width=12) (actual time=0.06..0.94 rows=100 loops=1) Total runtime: 975.13 msec (11 rows) Table campaign has 100 rows, campaign_items 5000, activity 10000. My guess is that the planner starts with "campaign" because of the low number of rows, but it still looks like filtering on "activity" would help things. Indeed, testing a.line_id instead of i.line_id does make a difference. QUERY PLAN ------------------- Hash Join (cost=241.70..457.54 rows=6 width=72) (actual time=161.20..225.68 rows=21 loops=1) Hash Cond: ("outer".cam_id = "inner".id) Join Filter: (("inner".line_id)::text = ("outer".line_id)::text) -> Seq Scan on campaign_items i (cost=0.00..90.00 rows=5000 width=32) (actual time=0.03..72.00 rows=5000 loops=1) -> Hash (cost=241.40..241.40 rows=118 width=40) (actual time=85.46..85.46 rows=0 loops=1) -> Nested Loop (cost=0.00..241.40 rows=118 width=40) (actual time=3.80..84.66 rows=94 loops=1) Join Filter: (("outer".start_time >= ("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <= ("inner".cam_to)::timestamp without time zone)) -> Seq Scan on activity a (cost=0.00..199.00 rows=11 width=28) (actual time=3.03..54.48 rows=19 loops=1) Filter: ((line_id)::text = '0912345 0004'::text) -> Seq Scan on campaign c (cost=0.00..2.00 rows=100 width=12) (actual time=0.03..0.89 rows=100 loops=19) Total runtime: 226.51 msec (11 rows) -- Richard Huxton
Richard Huxton <dev@archonet.com> writes: > Filter: ((line_id)::text = '0912345 0004'::text) So I think this means that line_id is being casted to "text". Though I'm not clear why it would be choosing "text" for the constant if line_id wasn't text to begin with. In any case my plans here look like: > Filter: (aa = 'x'::text) so it looks like there's something extra going on in your plan. what does your table definition look like? -- greg
Greg Stark <gsstark@mit.edu> writes: > Richard Huxton <dev@archonet.com> writes: >> If I join two tables with a comparison to a constant on one, why can't the >> planner see that the comparison applies to both tables: > It sure does. Postgres does an impressive job of tracing equality clauses > around for just this purpose. CVS tip does. Existing releases don't... regards, tom lane
On Wednesday 05 Mar 2003 3:02 pm, Greg Stark wrote: > Richard Huxton <dev@archonet.com> writes: > > Filter: ((line_id)::text = '0912345 0004'::text) > > So I think this means that line_id is being casted to "text". Though I'm > not clear why it would be choosing "text" for the constant if line_id > wasn't text to begin with. A domain defined as varchar() actually - which is why it's not using an index, but that's neither here nor there regarding the constant issue. > In any case my plans here look like: > > Filter: (aa = 'x'::text) > > so it looks like there's something extra going on in your plan. > > what does your table definition look like? rms=> \d campaign Table "rms.campaign" Column | Type | Modifiers ----------+-----------+----------- id | integer | not null title | item_name | cam_from | date | cam_to | date | owner | integer | Indexes: campaign_pkey primary key btree (id), campaign_from_idx btree (cam_from), campaign_to_idx btree (cam_to) rms=> \d campaign_items Table "rms.campaign_items" Column | Type | Modifiers -------------+---------+----------- cam_id | integer | not null line_id | tel_num | not null prod_id | integer | not null chg_per_min | integer | rev_per_min | integer | Indexes: campaign_items_pkey primary key btree (cam_id, line_id, prod_id), cam_item_line_idx btree (line_id) Foreign Key constraints: $1 FOREIGN KEY (cam_id) REFERENCES campaign(id) ON UPDATE NO ACTION ON DELETE NO ACTION, $2 FOREIGN KEY (line_id) REFERENCES line(telno) ON UPDATE NO ACTION ON DELETE NO ACTION, $3 FOREIGN KEY (prod_id) REFERENCES product(id) ON UPDATE NO ACTION ON DELETE NO ACTION rms=> \d activity Table "rms.activity" Column | Type | Modifiers ------------+-----------------------------+----------- line_id | tel_num | not null start_time | timestamp without time zone | not null call_dur | integer | Indexes: activity_pkey primary key btree (line_id, start_time), activity_start_idx btree (start_time) Foreign Key constraints: $1 FOREIGN KEY (line_id) REFERENCES line(telno) ON UPDATE NO ACTION ON DELETE NO ACTION -- Richard Huxton
Richard, > A domain defined as varchar() actually - which is why it's not using > an index, > but that's neither here nor there regarding the constant issue. You might improve your performance overall if you cast the constant to tel_num before doing the comparison in the query. Right now, the parser is casting the whole column to text instead, because it can't tell that the constant you supply is a valid tel_num. -Josh
On Wednesday 05 Mar 2003 7:00 pm, Josh Berkus wrote: > Richard, > > > A domain defined as varchar() actually - which is why it's not using > > an index, > > but that's neither here nor there regarding the constant issue. > > You might improve your performance overall if you cast the constant to > tel_num before doing the comparison in the query. Right now, the > parser is casting the whole column to text instead, because it can't > tell that the constant you supply is a valid tel_num. That's what I thought, but... rms=> EXPLAIN ANALYSE SELECT * FROM line WHERE telno='0912345 0004'::tel_num; QUERY PLAN ---------------------------------------------------------------------------------------------- Seq Scan on line (cost=0.00..20.50 rows=1 width=28) (actual time=0.10..5.28 rows=1 loops=1) Filter: ((telno)::text = ('0912345 0004'::character varying)::text) Total runtime: 5.43 msec rms=> EXPLAIN ANALYSE SELECT * FROM line WHERE telno='0912345 0004'::varchar; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using line_pkey on line (cost=0.00..5.78 rows=1 width=28) (actual time=14.03..14.03 rows=1 loops=1) Index Cond: ((telno)::character varying = '0912345 0004'::character varying) Total runtime: 14.28 msec Ignoring the times (fake data on my test box) it seems like there's an issue in comparing against DOMAIN defined types. Or maybe it's in the index definition, although I don't know how to find out the type of an index. -- Richard Huxton
On Wednesday 05 Mar 2003 7:00 pm, Josh Berkus wrote: > You might improve your performance overall if you cast the constant to > tel_num before doing the comparison in the query. Stranger and stranger... richardh=# CREATE DOMAIN intdom int4; richardh=# CREATE DOMAIN textdom text; richardh=# CREATE TABLE domtest (a intdom, b textdom); richardh=# CREATE INDEX domtest_a_idx ON domtest (a); richardh=# CREATE INDEX domtest_b_idx ON domtest (b); richardh=# INSERT INTO domtest VALUES (1,'aaa'); richardh=# INSERT INTO domtest VALUES (2,'bbb'); richardh=# INSERT INTO domtest VALUES (3,'ccc'); richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE a=1::intdom; ------------------------------------------------------------------------------------------------- Seq Scan on domtest (cost=0.00..22.50 rows=5 width=36) (actual time=0.08..0.11 rows=1 loops=1) Filter: ((a)::oid = 1::oid) richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE a=1::int4; ----------------------------------------------------------------------------------------------------------------------- Index Scan using domtest_a_idx on domtest (cost=0.00..17.07 rows=5 width=36) (actual time=0.09..0.11 rows=1 loops=1) Index Cond: ((a)::integer = 1) richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE b='aaa'::textdom; ----------------------------------------------------------------------------------------------------------------------- Index Scan using domtest_b_idx on domtest (cost=0.00..17.07 rows=5 width=36) (actual time=0.09..0.11 rows=1 loops=1) Index Cond: ((b)::text = 'aaa'::text) richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE b='aaa'::text; ----------------------------------------------------------------------------------------------------------------------- Index Scan using domtest_b_idx on domtest (cost=0.00..17.07 rows=5 width=36) (actual time=0.10..0.12 rows=1 loops=1) Index Cond: ((b)::text = 'aaa'::text) Can't think why we're getting casts to type "oid" in the first example - I'd have thought int4 would be the default. I'm guessing the text domain always works because that's the default cast. -- Richard Huxton