Re: Allowing join removals for more join types - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing join removals for more join types
Date
Msg-id CAApHDvoP=gHp8ekgAtdPeU7i0H7-XMdq7gX-Rd7mZdn-w3WvjA@mail.gmail.com
Whole thread Raw
In response to Re: Allowing join removals for more join types  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allowing join removals for more join types  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Sat, May 24, 2014 at 3:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> I've just had a bit of a look at implementing checks allowing subqueries
> with unique indexes on the join cols being removed,

I'm a bit confused by this statement of the problem.  I thought the idea
was to recognize that subqueries with DISTINCT or GROUP BY clauses produce
known-unique output column(s), which permits join removal in the same way
that unique indexes on a base table allow us to deduce that certain
columns are known-unique and hence can offer no more than one match for
a join.  That makes it primarily a syntactic check, which you can perform
despite the fact that the subquery hasn't been planned yet (since the
parser has done sufficient analysis to determine the semantics of
DISTINCT/GROUP BY).


Up thread a little Dilip was talking about in addition to checking that if the sub query could be proved to be unique on the join condition using DISTINCT/GROUP BY, we might also check unique indexes in the subquery to see if they could prove the query is unique on the join condition.

For example a query such as:

SELECT a.* FROM a LEFT JOIN (SELECT b.* FROM b LIMIT 1) b ON a.column = b.colwithuniqueidx

The presence of the LIMIT would be enough to stop the subquery being pulled up, but there'd be no reason to why the join couldn't be removed.

I think the use case for this is likely a bit more narrow than the GROUP BY/DISTINCT case, so I'm planning on using the time on looking into more common cases such as INNER JOINs where we can prove the existence of the row using a foreign key.
 
Drilling down into the subquery is a whole different matter.  For one
thing, there's no point in targeting cases in which the subquery would be
eligible to be flattened into the parent query, and your proposed list of
restrictions seems to eliminate most cases in which it couldn't be
flattened.  For another, you don't have access to any planning results for
the subquery yet, which is the immediate problem you're complaining of.
Duplicating the work of looking up a relation's indexes seems like a
pretty high price to pay for whatever improvement you might get here.


I agree that there are not many cases left to remove the join that remain after is_simple_subquery() has decided not to pullup the subquery. Some of the perhaps more common cases would be having windowing functions in the subquery as this is what you need to do if you want to include the results of a windowing function from within the where clause. Another case, though I can't imagine it would be common, is ORDER BY in the subquery... But for that one I can't quite understand why is_simple_subquery() stops that being flattened in the first place.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)
Next
From: Vik Fearing
Date:
Subject: SQL access to database attributes