Re: Allowing join removals for more join types - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing join removals for more join types
Date
Msg-id CAApHDvp8ibrMjo-f6E3PycrV6-SS8f6FMWJ+vwKW1WX0O+Y8Fw@mail.gmail.com
Whole thread Raw
In response to Re: Allowing join removals for more join types  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, Jun 22, 2014 at 11:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 June 2014 11:04, David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> As a point of procedure, I recommend separating the semijoin support into
>> its
>> own patch.  Your patch is already not small; delaying non-essential parts
>> will
>> make the essential parts more accessible to reviewers.
>>
>
> In the attached patch I've removed all the SEMI and ANTI join removal code
> and left only support for LEFT JOIN removal of sub-queries that can be
> proved to be unique on the join condition by looking at the GROUP BY and
> DISTINCT clause.

Good advice, we can come back for the others later.


> Example:
>
> SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP BY
> value) t2 ON t1.id = t2.value;

Looks good on initial look.

This gets optimized...

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1;

does it work with transitive closure like this..

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;

i.e. c.id is not in the join, but we know from subselect that c.id =
b.id and b.id is in the join


Well, there's no code that looks at equivalence of the columns in the query, but I'm not quite sure if there would have to be or not as I can't quite think of a way to write that query in a valid way that would cause it not to remove the join.

The example query will fail with: ERROR:  column "b.id" must appear in the GROUP BY clause or be used in an aggregate function

And if we rewrite it to use c.id in the target list

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT c.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;

With this one c.id becomes b.id, since we've given the subquery the alias 'b', so I don't think there's case here to optimise anything else.

Regards

David Rowley


pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Changes in amcanreturn() interface to support multicolumn indexes
Next
From: Jeff Davis
Date:
Subject: Re: [COMMITTERS] pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic