Thread: Optimizing Outer Joins
Hello, if i have the following (simple) table layout: create table a ( id serial primary key ); create table b ( id integer references a, test text ); create view c as select a.id,b.test from a left join b on a.id = b.id; So if i do a select * from c i get the following: test=# EXPLAIN SELECT * from g; QUERY PLAN ---------------------------------------------------------------- Hash Left Join (cost=2.45..8.91 rows=8 width=36) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on a (cost=0.00..1.08 rows=8 width=4) -> Hash (cost=2.16..2.16 rows=116 width=36) -> Seq Scan on b (cost=0.00..2.16 rows=116 width=36) and a select id from c executes as test=# EXPLAIN SELECT id from g; QUERY PLAN --------------------------------------------------------------- Hash Left Join (cost=2.45..7.02 rows=8 width=4) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on a (cost=0.00..1.08 rows=8 width=4) -> Hash (cost=2.16..2.16 rows=116 width=4) -> Seq Scan on b (cost=0.00..2.16 rows=116 width=4) so the only difference is the width estimation. But why is the scan on table b performed? If i understand it correctly this is unnecessary because the result contains only rows from table a. Is there a way to tell postgres not to do the extra work. My aim is to speed up lookup to complex joins. Thanks Sebastian
Sebastian Böck wrote: > Hello, > > if i have the following (simple) table layout: > > create table a ( > id serial primary key > ); > > create table b ( > id integer references a, > test text > ); > > create view c as > select a.id,b.test from a > left join b > on a.id = b.id; > test=# EXPLAIN SELECT * from g; > test=# EXPLAIN SELECT id from g; > so the only difference is the width estimation. > > But why is the scan on table b performed? > If i understand it correctly this is unnecessary because the > result contains only rows from table a. It's only unnecessary in the case where there is a 1:1 correspondence between a.id and b.id - if you had more than one matching row in "b" then there'd be repeated rows from "a" in the result. Not sure if PG can tell what the situation is regarding references and pkeys, but in your example you don't have one anyway. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Sebastian Böck wrote: >> But why is the scan on table b performed? >> If i understand it correctly this is unnecessary because the >> result contains only rows from table a. > > > It's only unnecessary in the case where there is a 1:1 correspondence > between a.id and b.id - if you had more than one matching row in "b" > then there'd be repeated rows from "a" in the result. Not sure if PG can > tell what the situation is regarding references and pkeys, but in your > example you don't have one anyway. Ok, is there a way to avoid the extra scan if only one row is returned (distinc on for example)? What would be great is if a subselect could work with more than one column returning. Is there a way to achieve this? Thanks Sebastian