Home > mailing lists

Re: join removal - Mailing list pgsql-hackers

From	Alex Brasetvik
Subject	Re: join removal
Date	July 24, 2009 11:59:56
Msg-id	1728754F-3747-429E-AD12-9F34CE357751@brasetvik.com Whole thread Raw
In response to	Re: join removal (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: join removal
List	pgsql-hackers

Tree view

On Jul 17, 2009, at 04:27 , Robert Haas wrote:

> - INNER joins are more complex because what happens on the inner side
> of the join can potentially wipe out rows from the result.  With a
> LEFT join, it's sufficient to prove that the inner rel is at least
> unique enough, but for an INNER join, we have to prove that it's
> exactly UNIQUE enough.  I think we can only provide this when the
> inner rel is a base relation with a unique index over EXACTLY (not a
> subset of) the relevant columns AND there is a foreign key
> relationship from the outer rel to the inner rel over the join
> columns.

Reasoning on foreign key relationships opens up for other optimization  
opportunities as well, so being able to prove that a join cannot alter  
the number of rows would be nice.

For example, Limit-operators can possibly be pushed below a join that  
does not alter the result set, to reduce the amount of work done by  
the join.

Also, we can prove that uniqueness properties are kept.

To put both examples in context, consider tables A and B defined as  
follows:
       Table "public.a" Column |  Type   | Modifiers
--------+---------+----------- id     | integer | not null
Indexes:    "a_pkey" PRIMARY KEY, btree (id)
Referenced by:    TABLE "b" CONSTRAINT "b_id_fkey" FOREIGN KEY (id) REFERENCES a(id)
       Table "public.b" Column |  Type   | Modifiers
--------+---------+----------- id     | integer | not null
Indexes:    "b_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:    "b_id_fkey" FOREIGN KEY (id) REFERENCES a(id)

The query plan for SELECT DISTINCT a.id FROM b JOIN a USING (id) ORDER  
BY a.id ASC LIMIT 10 is this:
                                     QUERY PLAN
------------------------------------------------------------------------------------- Limit  (cost=0.00..7.20 rows=10
width=4)  ->  Unique  (cost=0.00..36.72 rows=51 width=4)         ->  Merge Join  (cost=0.00..36.59 rows=51 width=4)
         Merge Cond: (b.id = a.id)               ->  Index Scan using b_pkey on b  (cost=0.00..29.02  
 
rows=51 width=4)               ->  Index Scan using a_pkey on a  (cost=0.00..13.77  
rows=101 width=4)

In this case we know that joining A does not alter the result set,  
because of the FK from B.id to A.id. Also, because B.id is also  
unique, the uniqueness of A.id is retained.

Thus, the plan can be optimized to something like
                  QUERY PLAN
---------------------------------------------
Merge Join  (...)  Merge Cond: (b.id = a.id)  ->  Limit  (...)      ->  Index Scan using a_pkey on a  (...)  ->  Index
Scanusing b_pkey on b  (...)
 

Perhaps these (and other) future opportunities make infrastructure  
changes for proper join removal support more worthwhile.

--
Alex Brasetvik

pgsql-hackers by date:

From: James Pye
Date: 24 July 2009, 11:25:40
Subject: Re: WIP: plpython3

From: Andrew Dunstan
Date: 24 July 2009, 13:06:05
Subject: Re: join regression failure on cygwin

Re: join removal - Mailing list pgsql-hackers

Previous

Next