Thread: Idea - optimising (left) joins?

Idea - optimising (left) joins?

From

Dawid Kuroczko

Date:

27 July 2005, 08:41:04

Hello.  I was just wondering, assume we have such tables:

CREATE TABLE data (  foo text,  somename_id integer not null references (somenames)
);

CREATE TABLE somenames (  somename_id serial PRIMARY KEY  somename text NOT NULL
);

And a view:

CREATE someview AS SELECT foo,somename FROM data NATURAL JOIN somenames;

...and a user does:
SELECT foo FROM data order by foo LIMIT 1 OFFSET 1000;

...we could assume some of the things:
1. as somename_id references somenames, and it is joined with somenames,   there will always be at least one row in
somenamesfor each row of data. 
2. as the somename_id is primary key, there will be exactly one row.
3. 1 and 2 together -- no matter if we join somenames or not, we'll get   same number of rows
4. So if columns from somenames are not used for anything, we can skip them.   No need to join at all.

Other scenario:
1. someone_id is a simple integer, but the join is left join.  The join is   performed with somename_id in somenames
(primarykey), so the   standard join would return 0 or 1 rows.  left join returns 1 row for   each row in data table. 
2. If somenames columns are not used -- we can skip them.

Why bother?  There are cases where data is normalised and there is
a view which joins main table with few tables similar to somenames
mentioned here.  If PostgreSQL could skip even looking at the tables
that will not be used, it could give advantage; the bigger the data,
the more advantage.

What do you think about it?  Is it worth the effort?
  Regards,     Dawid

Re: Idea - optimising (left) joins?

From

Tino Wildenhain

Date:

27 July 2005, 09:28:38

Am Mittwoch, den 27.07.2005, 10:40 +0200 schrieb Dawid Kuroczko:
> Hello.  I was just wondering, assume we have such tables:
> 
> CREATE TABLE data (
>    foo text,
>    somename_id integer not null references (somenames)
> );
> 
> CREATE TABLE somenames (
>    somename_id serial PRIMARY KEY
>    somename text NOT NULL
> );
> 
> And a view:
> 
> CREATE someview AS SELECT foo,somename FROM data NATURAL JOIN somenames;
> 
> ...and a user does:
> SELECT foo FROM data order by foo LIMIT 1 OFFSET 1000;
> 
> ...we could assume some of the things:
> 1. as somename_id references somenames, and it is joined with somenames,
>     there will always be at least one row in somenames for each row of data.
> 2. as the somename_id is primary key, there will be exactly one row. 
> 3. 1 and 2 together -- no matter if we join somenames or not, we'll get
>     same number of rows
> 4. So if columns from somenames are not used for anything, we can skip them.
>     No need to join at all.
> 
> Other scenario:
> 1. someone_id is a simple integer, but the join is left join.  The join is
>     performed with somename_id in somenames (primary key), so the
>     standard join would return 0 or 1 rows.  left join returns 1 row for
>     each row in data table.
> 2. If somenames columns are not used -- we can skip them.
> 
> Why bother?  There are cases where data is normalised and there is
> a view which joins main table with few tables similar to somenames
> mentioned here.  If PostgreSQL could skip even looking at the tables
> that will not be used, it could give advantage; the bigger the data,
> the more advantage.
> 
> What do you think about it?  Is it worth the effort?

Afaic its already done ;) In fact, views are implemented as
rules - so they are kind of "include" in your query and
are optimized just the same way as if you type the full query 
there.

See explain / explain analyze.

Regards
Tino

Re: Idea - optimising (left) joins?

From

Stephan Szabo

Date:

27 July 2005, 14:19:31

On Wed, 27 Jul 2005, Dawid Kuroczko wrote:

> Hello.  I was just wondering, assume we have such tables:
>
> CREATE TABLE data (
>    foo text,
>    somename_id integer not null references (somenames)
> );
>
> CREATE TABLE somenames (
>    somename_id serial PRIMARY KEY
>    somename text NOT NULL
> );
>
> And a view:
>
> CREATE someview AS SELECT foo,somename FROM data NATURAL JOIN somenames;
>
> ...and a user does:
> SELECT foo FROM data order by foo LIMIT 1 OFFSET 1000;
>
> ...we could assume some of the things:
> 1. as somename_id references somenames, and it is joined with somenames,
>     there will always be at least one row in somenames for each row of data.
> 2. as the somename_id is primary key, there will be exactly one row.
> 3. 1 and 2 together -- no matter if we join somenames or not, we'll get
>     same number of rows
> 4. So if columns from somenames are not used for anything, we can skip them.
>     No need to join at all.

Note that I believe the above only holds if also the reference is
immediate (technically, also, the key must be immediate but we don't
support non-immediate primary keys currently). I'm uncertain whether the
condition holds inside triggers eventually coming from updates to data or
somenames as well as from my last reading of SQL2003 I believe it's
possible to get into places where you're running before the constraint
check occurs.

> Why bother?  There are cases where data is normalised and there is
> a view which joins main table with few tables similar to somenames
> mentioned here.  If PostgreSQL could skip even looking at the tables
> that will not be used, it could give advantage; the bigger the data,
> the more advantage.
>
> What do you think about it?  Is it worth the effort?

Possibly, one big question is how much cost does it add to queries that
the optimization doesn't apply in order to try to see if it does.