Home > mailing lists

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 9.3 Pre-proposal: Range Merge Join
Date	April 18, 2012 05:22:02
Msg-id	21274.1334726500@sss.pgh.pa.us Whole thread Raw
In response to	Re: 9.3 Pre-proposal: Range Merge Join (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: 9.3 Pre-proposal: Range Merge Join
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 16, 2012 at 1:43 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> ... Only one side really needs the mark and restore logic, but it was easier
>> to write the pseudocode in a symmetrical way (except step 7).

> I'm actually not sure these are equivalent formulations.  Suppose one
> side has [i,i] where i ranges from 1 to 10000000 and the other side
> the exact same thing plus [1,10000000].  That one really big range
> will come up second on the right side, and you will not be able to
> discard it until you reach the end of the join.  If you just keep
> rewinding the right side, you're going to end up with O(n^2) behavior,
> whereas if you can discard tuples "from the middle" on the right side,
> then you will get O(n) behavior, which is a big difference.  In other
> words, in your original algorithm the tuples that you discard in step
> 4 or 5 need not be the first remaining tuple on whichever side of the
> join we're talking about.

It would be a pretty weird implementation of mergejoin that could
"discard tuples from the middle" of an input stream.  Or to be more
specific, it wouldn't be the mergejoin itself that could do that at all
--- you'd need the input plan node to be some sort of tweaked version of
tuplestore or tuplesort that could respond to a request like that.

I can't escape the feeling that Jeff has chosen the wrong basis for his
thought experiment, and that what he really ought to be thinking about
is hashjoin, which keeps an in-memory table that it could easily modify
on the fly if it chose to.  The multi-batch aspect of hybrid hashjoin
could be useful too (IOW, when you're out of better ideas, throw the
tuple back in the water to process later).

This is just handwaving of course.  I think some digging in the
spatial-join literature would likely find ideas better than any of
these.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 18 April 2012, 05:07:06
Subject: Re: Improving our clauseless-join heuristics

From: Magnus Hagander
Date: 18 April 2012, 05:30:43
Subject: Re: Bug tracker tool we need

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

Previous

Next