[HACKERS] Merge join for GiST - Mailing list pgsql-hackers

From Andrew Borodin
Subject [HACKERS] Merge join for GiST
Date
Msg-id CAJEAwVE5z5_mE-0SUBjNjNY=CPYf8vq0N9pGHiJckWqE5-8H3A@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Merge join for GiST  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Merge join for GiST  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
Hello, hackers!

==Spatial joins==
Scientific papers from the dawn of R-trees and multidimensional
indexes feature a lot of algorithms for spatial joins.
I.e. you have two sets of geometries s1 and s2, you need to produce
all colliding pairs (p1,p2) where p1 in s1 and p2 in s2. For 2 R-trees
of equal heights with roots R and S most straightforward[1] algorithm
look like:

SpatialJoin (R,S: Node)
{ FOR (all E in  S)   FOR (all ER in R with ER.rect intersecting  E.rect)      IF (R is a leaf page)      {
OutputER,ES      }      ELSE      {        SpatialJoin (ER.ref, ES.ref)      } 
}


==Motivation==
I’m mostly interested in OLAP-style aggregates like:

SELECT rows.Id, sum(data.attribute1), sum(data.attribute2), sum(data.attribute3)
FROM rows LEFT JOIN data ON rows.IndexedAttribute && data.IndexedAttribute
GROUP BY 1

When we have two GiST for rows.IndexedAttribute and data.IndexedAttribute.
Currently, this query produces plans with Nested Loop Join, so for
every row from “rows” there is one GiST scan into “data”.


==Proposal==
Obviously, for this scenario, we can use GiST-based join algorithm
identical to the SpatialJoin algorithm above.

This algorithm will work iif (key1 && key2)  ==> (union(key1,key3) &&
key2 ) and (union(key2,key3) && key1 ) for any key3. This is quite
straightforward for && operator, while I’m not sure this will work for
<@ and @> and others.

I can try to implement this kind of join as an extension or try to
embed this into the planner.

I think this idea is somewhat related to this patch [2], but as for
now cannot describe how exactly GiST merge and Range Merge features
relate.


How do you think:
1. Has this idea been considered before? I had not found anything on
actual spatial join of GiSTS.
2. What is the easiest way to implement this feature?
3. What kind of operators may be spatial-joined without intervention
to existing GiST scan strategies API?

Many thanks for reading.

Best regards, Andrey Borodin.

[1] Brinkhoff, Thomas, Hans-Peter Kriegel, and Bernhard Seeger.
Efficient processing of spatial joins using R-trees. Vol. 22. No. 2.
ACM, 1993.
[2]
https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] strange parallel query behavior after OOM crashes
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Compiler warning in costsize.c