Thread: Optimizing Outer Joins

Optimizing Outer Joins

From
Sebastian Böck
Date:
Hello,

if i have the following (simple) table layout:

create table a (
   id serial primary key
);

create table b (
   id integer references a,
   test text
);

create view c as
   select a.id,b.test from a
   left join b
   on a.id = b.id;

So if i do a select * from c i get the following:

test=# EXPLAIN SELECT * from g;
                            QUERY PLAN
----------------------------------------------------------------
  Hash Left Join  (cost=2.45..8.91 rows=8 width=36)
    Hash Cond: ("outer".id = "inner".id)
    ->  Seq Scan on a  (cost=0.00..1.08 rows=8 width=4)
    ->  Hash  (cost=2.16..2.16 rows=116 width=36)
          ->  Seq Scan on b  (cost=0.00..2.16 rows=116 width=36)

and a select id from c executes as

test=# EXPLAIN SELECT id from g;
                           QUERY PLAN
---------------------------------------------------------------
  Hash Left Join  (cost=2.45..7.02 rows=8 width=4)
    Hash Cond: ("outer".id = "inner".id)
    ->  Seq Scan on a  (cost=0.00..1.08 rows=8 width=4)
    ->  Hash  (cost=2.16..2.16 rows=116 width=4)
          ->  Seq Scan on b  (cost=0.00..2.16 rows=116 width=4)

so the only difference is the width estimation.

But why is the scan on table b performed?
If i understand it correctly this is unnecessary because the
result contains only rows from table a.

Is there a way to tell postgres not to do the extra work.
My aim is to speed up lookup to complex joins.

Thanks

Sebastian

Re: Optimizing Outer Joins

From
Richard Huxton
Date:
Sebastian Böck wrote:
> Hello,
>
> if i have the following (simple) table layout:
>
> create table a (
>   id serial primary key
> );
>
> create table b (
>   id integer references a,
>   test text
> );
>
> create view c as
>   select a.id,b.test from a
>   left join b
>   on a.id = b.id;

> test=# EXPLAIN SELECT * from g;

> test=# EXPLAIN SELECT id from g;

> so the only difference is the width estimation.
>
> But why is the scan on table b performed?
> If i understand it correctly this is unnecessary because the
> result contains only rows from table a.

It's only unnecessary in the case where there is a 1:1 correspondence
between a.id and b.id - if you had more than one matching row in "b"
then there'd be repeated rows from "a" in the result. Not sure if PG can
  tell what the situation is regarding references and pkeys, but in your
example you don't have one anyway.

--
   Richard Huxton
   Archonet Ltd

Re: Optimizing Outer Joins

From
Sebastian Böck
Date:
Richard Huxton wrote:
> Sebastian Böck wrote:
>> But why is the scan on table b performed?
>> If i understand it correctly this is unnecessary because the
>> result contains only rows from table a.
>
>
> It's only unnecessary in the case where there is a 1:1 correspondence
> between a.id and b.id - if you had more than one matching row in "b"
> then there'd be repeated rows from "a" in the result. Not sure if PG can
>  tell what the situation is regarding references and pkeys, but in your
> example you don't have one anyway.

Ok, is there a way to avoid the extra scan if only one row is
returned (distinc on for example)?

What would be great is if a subselect could work with more than
one column returning. Is there a way to achieve this?

Thanks Sebastian