Home > mailing lists

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: 9.3 Pre-proposal: Range Merge Join
Date	April 16, 2012 17:12:34
Msg-id	1334607138.19915.89.camel@jdavis Whole thread Raw
In response to	Re: 9.3 Pre-proposal: Range Merge Join (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: 9.3 Pre-proposal: Range Merge Join Re: 9.3 Pre-proposal: Range Merge Join
List	pgsql-hackers

Tree view

On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote:

> There is a good overview article about spatial joins.
> http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf 

Thank you, that's exactly the kind of overview I was looking for.

> It shows that there is a lot of methods based on building temporaty
> indexes. It might mean that building of temporary GiST index is not a
> bad idea.

That had occurred to me, but I was hesitant to only use temp indexes. It
still doesn't really offer a good solution when both sides of the join
are relatively large (because of random I/O). Also the build speed of
the index would be more important than it is now.

On the other hand, if I tackle the problem using temp indexes, it would
be a more general solution useful for many problems (for instance, we
really need a better solution when the result of a subquery doesn't fit
in a work_mem-sized hashtable).

We may end up with some kind of combination (as the sweep seems to be;
see below).

> Also there are methods which looks like extension of Jeff Davis
> proposal to the multidimensional case. It is Plane Sweep and
> External Plane Sweep methods. However, it might use sophisticated data
> structures for the "sweep". And I believe it should use it for good
> performance.

That method looks closer to what I had in mind.

My Range Join proposal is essentially the same as the sweep method
except that it only joins on one dimension, and the rest of the
dimensions have to be checked with RECHECK. So, this still works for an
N-dimensional join, as long as a single dimension is fairly selective.

The sweep method seems to do the first dimension with the sweep method,
and maintains a changing index that it uses to do the lookup. In other
words, it's essentially just a way to do the RECHECK on the other
dimensions in less than O(n) time. So, if the sweep dimension is not
selective at all, this degenerates to the temporary index method (or
something close, anyway).

The fact that, in the sweep method, there is still one "special"
dimension, makes me think that it will be easier to make the API work
based on ranges (which is a big win because postgres already knows about
ranges). If so, that makes it easier to divide the project into stages,
because we could get it working for ranges and then extend it to support
other dimensions more efficiently later.

Regards,Jeff Davis

pgsql-hackers by date:

From: Robert Haas
Date: 16 April 2012, 17:02:21
Subject: Re: index-only scans vs. Hot Standby, round two

From: Robert Haas
Date: 16 April 2012, 17:14:03
Subject: Re: index-only scans vs. Hot Standby, round two

Re: 9.3 Pre-proposal: Range Merge Join - Mailing list pgsql-hackers

Previous

Next