Thread: GSoC 2015: SP-GIST for geometrical objects

GSoC 2015: SP-GIST for geometrical objects

From
Dima Ivanovskiy
Date:
Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology

Abstract:
I chose project "Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space". 
According to the presentation
https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types:
box, circle, polygon with operators: && &> &< &<| >> << <<| <@ @> @ |&> |>> ~ ~=
Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features.

Project details:
After meeting with Alexander Korotkov, I wrote some plan. 
Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases.
The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space.
New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators.
After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2).
Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure.
At the end we have index at geometrical objects and use traversal tree for every search operator.

Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree.

Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures.

Project Schedule:

until May 25

Read documentation and source code, clarify details of implementation.

1st month

Implement new '_ops' with all geometrical operators for box, circle, polygon

2nd month

Research new methods for increase speed of geometrical query

3rd month

Final refactoring, testing and submitting a patch.

Links:
http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST
https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes
http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects

Re: GSoC 2015: SP-GIST for geometrical objects

From
Arthur Silva
Date:
<p dir="ltr"><br /> On Mar 27, 2015 11:08 AM, "Dima Ivanovskiy" <<a
href="mailto:dima-iv@mail.ru">dima-iv@mail.ru</a>>wrote:<br /> ><br /> > Hello, I am Dmitrii, student of
MoscowInstitute of Physics and Technology<br /> ><br /> > Abstract:<br /> ><br /> > I chose project
"Indexingprolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to
4d-space".<br /> > According to the presentation<br /> > <a
href="https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf">https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf</a><br
/>> SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: <br /> > box,
circle,polygon with operators: && &> &< &<| >> << <<| <@ @> @
|&>|>> ~ ~=<br /> > Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of
geometricalfeatures.<br /> ><br /> > Project details:<br /> ><br /> > After meeting with Alexander
Korotkov,I wrote some plan. <br /> > Using of K-D-tree and Quadtree in building index for geometrical data types can
increasespeed of search in some cases.<br /> > The main idea is representing 2-D geometrical objects in their
boundingbox. Set of 2-D boxes is 4-D space. <br /> > New _ops will work with points from 4-D space, for example
kd_box_ops,quad_circle_ops and will support all geometrical operators. <br /> > After conversion object to their
boundingbox algo has set of tuples (x1, y1, x2, y2). <br /> > Our goal is separate this space the most equally. If
wetalk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next
stepby the second coordinate etc., after 4-th coordinate we repeat this procedure. <br /> > At the end we have index
atgeometrical objects and use traversal tree for every search operator. <br /> ><br /> > Postgresql has already
hasrealization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree.<br />
><br/> > Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this
clearmethods, I will try to find more effective way. Maybe with using combination of different spatial tree
structures.<br/> ><br /> > Project Schedule:<br /> ><br /> > until May 25<br /> ><br /> > Read
documentationand source code, clarify details of implementation.<br /> ><br /> > 1st month<br /> ><br /> >
Implementnew '_ops' with all geometrical operators for box, circle, polygon<br /> ><br /> > 2nd month<br />
><br/> > Research new methods for increase speed of geometrical query<br /> ><br /> > 3rd month<br />
><br/> > Final refactoring, testing and submitting a patch.<br /> ><br /> ><br /> > Links:<br /> ><br
/>> <a
href="http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html">http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html</a>
-about GIST<br /> > <a href="https://toster.ru/q/27135#answer_110197">https://toster.ru/q/27135#answer_110197</a> -
peopleneed SP-GIST for cubes<br /> > <a
href="http://www.slideshare.net/profyclub_ru/o-lt">http://www.slideshare.net/profyclub_ru/o-lt</a>- presentation about
indexes<br/> > <a
href="http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf">http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf</a>
-working with geo objects<br /> ><br /> Nice proposal. <p dir="ltr">Dynamic Kdtrees can perform badly as the
splittingmedian can get way off as updates are coming. What are your thoughts about that? <p dir="ltr">Also what's up
withthe 4d space? I don't quite get it. These types are 2 or 3 dimensions. <br /> 

Re[2]: [HACKERS] GSoC 2015: SP-GIST for geometrical objects

From
Dima Ivanovskiy
Date:

On Mar 27, 2015 11:08 AM, "Dima Ivanovskiy" <dima-iv@mail.ru> wrote:
>
> Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology
>
> Abstract:
>
> I chose project "Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space".
> According to the presentation
> https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
> SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types:
> box, circle, polygon with operators: && &> &< &<| >> << <<| <@ @> @ |&> |>> ~ ~=
> Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features.
>
> Project details:
>
> After meeting with Alexander Korotkov, I wrote some plan.
> Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases.
> The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space.
> New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators.
> After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2).
> Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure.
> At the end we have index at geometrical objects and use traversal tree for every search operator.
>
> Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree.
>
> Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures.
>
> Project Schedule:
>
> until May 25
>
> Read documentation and source code, clarify details of implementation.
>
> 1st month
>
> Implement new '_ops' with all geometrical operators for box, circle, polygon
>
> 2nd month
>
> Research new methods for increase speed of geometrical query
>
> 3rd month
>
> Final refactoring, testing and submitting a patch.
>
>
> Links:
>
> http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST
> https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
> http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes
> http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects
>
Nice proposal.

Dynamic Kdtrees can perform badly as the splitting median can get way off as updates are coming. What are your thoughts about that?

Also what's up with the 4d space? I don't quite get it. These types are 2 or 3 dimensions.


I read spgist README one more time. I didn't find the mechanism for maintaining good balance after updates.
I think we can use Bkd-Tree, https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf. But It can be not the best solving.
I include Research time in 2nd month of timeline.

About 4d space. All these types are 2 dimensional.
Just as in R-tree object is approximated by MBR. MBR for 2d-objects can be mapped to 4d-point. More general, nd-object MBR can be mapped into 2nd-point.


Re: GSoC 2015: SP-GIST for geometrical objects

From
Arthur Silva
Date:
<p dir="ltr"><br /> On Mar 27, 2015 6:41 PM, "Dima Ivanovskiy" <<a
href="mailto:dima-iv@mail.ru">dima-iv@mail.ru</a>>wrote:<br /> ><br /> ><br /> >> On Mar 27, 2015 11:08
AM,"Dima Ivanovskiy" <<a href="mailto:dima-iv@mail.ru">dima-iv@mail.ru</a>> wrote:<br /> >> ><br />
>>> Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology<br /> >> ><br />
>>> Abstract:<br /> >> ><br /> >> > I chose project "Indexing prolonged geometrical objects
(i.e.boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space". <br /> >> > According to the
presentation<br/> >> > <a
href="https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf">https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf</a><br
/>>> > SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: <br />
>>> box, circle, polygon with operators: && &> &< &<| >> << <<|
<@@> @ |&> |>> ~ ~=<br /> >> > Popular spatial extension PostGIS doesn't include SP-GIST,
buthas a lot of geometrical features.<br /> >> ><br /> >> > Project details:<br /> >> ><br
/>>> > After meeting with Alexander Korotkov, I wrote some plan. <br /> >> > Using of K-D-tree and
Quadtreein building index for geometrical data types can increase speed of search in some cases.<br /> >> >
Themain idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space. <br />
>>> New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support
allgeometrical operators. <br /> >> > After conversion object to their bounding box algo has set of tuples
(x1,y1, x2, y2). <br /> >> > Our goal is separate this space the most equally. If we talk about K-D-tree, on
firststep K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate
etc.,after 4-th coordinate we repeat this procedure. <br /> >> > At the end we have index at geometrical
objectsand use traversal tree for every search operator. <br /> >> ><br /> >> > Postgresql has
alreadyhas realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree.<br
/>>> ><br /> >> > Of cource, I assume that SP-GIST can be not the best decision of this problem. So
aftertesting this clear methods, I will try to find more effective way. Maybe with using combination of different
spatialtree structures.<br /> >> ><br /> >> > Project Schedule:<br /> >> ><br /> >>
>until May 25<br /> >> ><br /> >> > Read documentation and source code, clarify details of
implementation.<br/> >> ><br /> >> > 1st month<br /> >> ><br /> >> > Implement new
'_ops'with all geometrical operators for box, circle, polygon<br /> >> ><br /> >> > 2nd month<br />
>>><br /> >> > Research new methods for increase speed of geometrical query<br /> >> ><br />
>>> 3rd month<br /> >> ><br /> >> > Final refactoring, testing and submitting a patch.<br />
>>><br /> >> ><br /> >> > Links:<br /> >> ><br /> >> > <a
href="http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html">http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html</a>
-about GIST<br /> >> > <a
href="https://toster.ru/q/27135#answer_110197">https://toster.ru/q/27135#answer_110197</a>- people need SP-GIST for
cubes<br/> >> > <a
href="http://www.slideshare.net/profyclub_ru/o-lt">http://www.slideshare.net/profyclub_ru/o-lt</a>- presentation about
indexes<br/> >> > <a
href="http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf">http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf</a>
-working with geo objects<br /> >> ><br /> >> Nice proposal.<br /> >><br /> >> Dynamic
Kdtreescan perform badly as the splitting median can get way off as updates are coming. What are your thoughts about
that?<br/> >><br /> >> Also what's up with the 4d space? I don't quite get it. These types are 2 or 3
dimensions.<br/> ><br /> ><br /> > I read spgist README one more time. I didn't find the mechanism for
maintaininggood balance after updates.<br /> > I think we can use Bkd-Tree, <a
href="https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf">https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf</a>.
ButIt can be not the best solving.<br /> > I include Research time in 2nd month of timeline.<br /> ><br /> >
About4d space. All these types are 2 dimensional.<br /> > Just as in R-tree object is approximated by MBR. MBR for
2d-objectscan be mapped to 4d-point. More general, nd-object MBR can be mapped into 2nd-point. <br /> ><br /> ><p
dir="ltr">Thereason I said that is because you pointed performance as one motivation  factor (and the lack of balancing
propertiescan degrade performance really fast on larger indexes).<p dir="ltr">The Bkd variant seems interesting but I
don'tthink spgist provides enough abstraction to implement it.<p dir="ltr">A bounding box can still be inserted/queried
witha 2d kdtree so I don't know why you call it 4d. I assume it's a matter of naming.<p dir="ltr">Overall the proposal
seemstotally doable, although I have doubts about its usefulness. But I'm no one... You should have some feedback from
thementors soon.<br />