Re: [HACKERS] Range Merge Join v1 - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: [HACKERS] Range Merge Join v1
Date
Msg-id CAMp0ubfKjj6H39iAX=zoPZ0ke_MAyTatQPwMrfw5dkSSztuV4Q@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Range Merge Join v1  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Version 2 attached. Fixed a few issues, expanded tests, added docs.

A simple performance test (script attached) shows about a 5X
improvement when comparing against a nested loop with an inner
index-only scan over a gist index.

Even better, this doesn't require an index, so it will work even if
the input relations are subqueries.

Regards,
    Jeff Davis

On Thu, Apr 6, 2017 at 1:43 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> ========
> Example:
> ========
>
> Find different people using the same website at the same time:
>
> create table session(sessionid text, username text, during tstzrange);
> SELECT s1.username, s2.username, s1.during * s2.during
>   FROM session s1, session s2
>   WHERE s1.during && s2.during AND s1.username < s2.username
>
>
> =====================================
> Brief summary of previous discussion:
> =====================================
>
> http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis
>
> - can indexes solve it already (parameterized paths, etc.)?
> - planner complexity (track path keys, mergejoinable equality, etc.)
> - spatial join algorithm choice
> - compare range merge join against existing indexes to see if any
>   indexing approach is viable
>
> ==========
> Externals:
> ==========
>
> No new syntax or other externals. Just improved execution strategy for
> joining ranges using the overlaps operator (&&).
>
> New syntax is possible, but I don't see a strong reason to support
> special range join syntax at this time.
>
> =============
> Optimizer Design
> =============
>
> I took the path of least resistance, so if I am doing something wrong
> please let me know. I hardwired it to look for the range overlaps
> operator when checking if it could do a merge join, and then add
> range_ops to restrictinfo->mergeopfamilies.
>
> Then, I have to suppress adding it as an equivalence class, because
> overlaps is not equality. It still adds single-member ECs to
> restrictinfo->right_ec and left_ec, but (I think) that's OK.
>
> Also, I have to prevent generating paths for right/full join, because
> the range join algorithm can't (currently) handle those.
>
> =============
> Costing
> =============
>
> Needs more consideration. Seems to do reasonable things in the few
> examples I tried.
>
> =============
> Executor Design
> =============
>
> See detailed comments in nodeMergejoin.c
>
> =============
> Performance
> =============
>
> Seems much better than index nest loop join when the join is not
> selective. I will post detailed numbers soon.
>
> If no index is available, range merge join is the only reasonable way
> to execute a range join. For instance, if the inner is not a leaf in
> the plan.
>
> =============
> Alternatives:
> =============
>
> It was suggested that I approach it as a general spatial-join
> problem. That has merit, but I rejected it for now because the
> algorithm that I think has the most promise is range-oriented
> anyway. By that I mean we would need to extract all of the dimensions
> into ranges first, and then perform the join. With that in mind, it
> makes sense to implement range joins first; and then later provide the
> APIs to get at the spatial dimensions so that we can implement other
> spatial joins as range joins.
>
> Another suggestion was to base it on hash join, but I never understood
> that proposal and it didn't seem to map very well to a spatial join.
>
> Yet another suggestion was to use some kind of temporary index. Some
> brief experiments I did indicated that it would be fairly slow (though
> more investigation might be useful here). Also, it doesn't provide any
> alternative to the nestloop-with-inner-index we already offer at the
> leaf level today.
>
> Regards,
>      Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] strange parallel query behavior after OOM crashes
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] Variable substitution in psql backtick expansion