Home > mailing lists

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: 9.3 Pre-proposal: Range Merge Join
Date	April 19, 2012 05:11:12
Msg-id	1334823058.5487.84.camel@jdavis Whole thread Raw
In response to	Re: 9.3 Pre-proposal: Range Merge Join (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: 9.3 Pre-proposal: Range Merge Join
List	pgsql-hackers

Tree view

On Wed, 2012-04-18 at 01:21 -0400, Tom Lane wrote:
> It would be a pretty weird implementation of mergejoin that could
> "discard tuples from the middle" of an input stream.  Or to be more
> specific, it wouldn't be the mergejoin itself that could do that at all
> --- you'd need the input plan node to be some sort of tweaked version of
> tuplestore or tuplesort that could respond to a request like that.

As I said in my reply to Robert, I think there are some ways we can make
this idea work.

> I can't escape the feeling that Jeff has chosen the wrong basis for his
> thought experiment, and that what he really ought to be thinking about
> is hashjoin, which keeps an in-memory table that it could easily modify
> on the fly if it chose to.  The multi-batch aspect of hybrid hashjoin
> could be useful too (IOW, when you're out of better ideas, throw the
> tuple back in the water to process later).

Obviously hashing is not going to be much use for anything but equality.
So I believe this approach is very similar to the temporary-index
method, except with batching, and always keeping the index in memory.

I don't think we would get the partitioning benefit of hashjoin, because
we'd have to put the same tuple in multiple partitions, so it's probably
better to just leave the outer side intact.

But in-memory indexes and multiple passes of the outer seems like a
reasonable alternative, particularly because an in-memory index might be
very fast (to build as well as query).

> This is just handwaving of course.  I think some digging in the
> spatial-join literature would likely find ideas better than any of
> these.

I will look in some more detail. The merge-like approach did seem to be
represented in the paper referenced by Alexander (the external plane
sweep), but it also refers to several methods based on partitioning.

I'm beginning to think that more than one of these ideas has merit.

Regards,Jeff Davis

pgsql-hackers by date:

From: Sandro Santilli
Date: 19 April 2012, 04:56:53
Subject: Re: Gsoc2012 idea, tablesample

From: Magnus Hagander
Date: 19 April 2012, 06:05:25
Subject: New sync commit mode remote_write

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

Previous

Next