Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 9.3 Pre-proposal: Range Merge Join
Date
Msg-id 21274.1334726500@sss.pgh.pa.us
Whole thread Raw
In response to Re: 9.3 Pre-proposal: Range Merge Join  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: 9.3 Pre-proposal: Range Merge Join
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 16, 2012 at 1:43 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> ... Only one side really needs the mark and restore logic, but it was easier
>> to write the pseudocode in a symmetrical way (except step 7).

> I'm actually not sure these are equivalent formulations.  Suppose one
> side has [i,i] where i ranges from 1 to 10000000 and the other side
> the exact same thing plus [1,10000000].  That one really big range
> will come up second on the right side, and you will not be able to
> discard it until you reach the end of the join.  If you just keep
> rewinding the right side, you're going to end up with O(n^2) behavior,
> whereas if you can discard tuples "from the middle" on the right side,
> then you will get O(n) behavior, which is a big difference.  In other
> words, in your original algorithm the tuples that you discard in step
> 4 or 5 need not be the first remaining tuple on whichever side of the
> join we're talking about.

It would be a pretty weird implementation of mergejoin that could
"discard tuples from the middle" of an input stream.  Or to be more
specific, it wouldn't be the mergejoin itself that could do that at all
--- you'd need the input plan node to be some sort of tweaked version of
tuplestore or tuplesort that could respond to a request like that.

I can't escape the feeling that Jeff has chosen the wrong basis for his
thought experiment, and that what he really ought to be thinking about
is hashjoin, which keeps an in-memory table that it could easily modify
on the fly if it chose to.  The multi-batch aspect of hybrid hashjoin
could be useful too (IOW, when you're out of better ideas, throw the
tuple back in the water to process later).

This is just handwaving of course.  I think some digging in the
spatial-join literature would likely find ideas better than any of
these.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Improving our clauseless-join heuristics
Next
From: Magnus Hagander
Date:
Subject: Re: Bug tracker tool we need