Re: Implied BETWEEN from join quals - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Implied BETWEEN from join quals
Date
Msg-id CA+TgmoZEpuP9TvJv+9MnJQTPEm95jrGOq1mYhV_A=kt6kujJ_g@mail.gmail.com
Whole thread Raw
In response to Implied BETWEEN from join quals  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Implied BETWEEN from join quals
List pgsql-hackers
On Tue, Apr 22, 2014 at 11:44 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I was recently nudged to describe an optimisation of merge
> joins/sorts. Rather than decribe that, I've looked at the general
> case:
>
> There are some additional implications we may make when joining
> tables... a particularly interesting one is that
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
>
> can be rewritten as
>
> SELECT *
> FROM Fact JOIN Dim on Fact.col = Dim.col
> WHERE
> (
> Fact.col BETWEEN
>  (SELECT min(col) FROM Dim)
> AND
>  (SELECT max(col) FROM Dim)
> )
> AND
> (
> Dim.col BETWEEN
>  (SELECT min(col) FROM Fact)
> AND
>  (SELECT max(col) FROM Fact)
> )
>
> which also works similarly for anti/semi-joins.
>
> If we have indexes on A.col and B.col then these additional quals can
> be derived cheaply at run-time and could have an important effect on
> optimisation.

Interesting.  This can pretty obviously produce a big win if things
break our way.  But it'll take some effort do determine whether the
range of possible values for the join column is narrow enough to make
the inferred BETWEEN clause selective enough to be useful, and that
effort will be wasted if it turns out that the answer is no.  I can
certainly think of times when this would have helped me, so I kind of
like the idea in theory ... but I won't be surprised if it turns out
to be possible to construct realistic test cases where trying to
figure this out causes too much increase in planning time.

Actually, it seems possible to me that this could end up winning even
if it doesn't enable the use of an index scan rather than a sequential
scan, because eliminating rows during the scan is typically much
cheaper than eliminating them during the join step.  But it also seems
possible that it could lose heavily in that case, if the extra steps
during the scan phase don't eliminate enough rows to reduce the join
cost to a sufficient degree.  I'm not really sure how it will work out
on balance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: allowing VACUUM to be cancelled for conflicting locks
Next
From: Tom Lane
Date:
Subject: Re: Decrease MAX_BACKENDS to 2^16