Re: GSOC Student Project Idea - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GSOC Student Project Idea
Date
Msg-id CAPpHfdskjgBMsXd-dLMnhFG-8c2KRyZxseKkByV_6gev6ea6GQ@mail.gmail.com
Whole thread Raw
In response to GSOC Student Project Idea  (Michael Schuh <schuh.mike@gmail.com>)
Responses Re: GSOC Student Project Idea
List pgsql-hackers
Michael,

On Mon, Apr 22, 2013 at 4:28 PM, Michael Schuh <schuh.mike@gmail.com> wrote:
Although I do not have a lot of experience with PostgreSQL development, I am eager to learn and commit my summer to enabling another fantastic feature for the community. Since iDistance is a non-recursive, data-driven, space-based partitioning strategy which builds directly onto a B+-tree, I believe the implementation should be possible using only GiST support. Please let me know if this is of any interest, or if you have any additional questions. Unfortunately, I will be unavailable most of the day, but I plan to fill out the GSOC application later this evening.

I've taken a brief look on the paper and implementation. As I can see iDistance implements some global building strategy. I mean, for example, it selects some point, calculates distances from selected point to all points in dataset etc. So, it uses the whole dataset at the same time. 

GiST and SP-GiST now behaves differently. They have interface functions which operates at maximum very small portion of data which fits to disk page. You can try to fit iDistance into GiST or SP-GiST interface. But will it be that efficient in this case? I've experimented with fitting VP-tree (which looks similar to iDistance at first glance) into SP-GiST. Then my results was depressing: it was useless with levenshtein distance, and it was worse than R-tree with multidimensional points. So I think we really need global index building algorithm in order to reveal power of such data structures. But GiST and SP-GiST currently doesn't support it.

However you can try to implement global index building in GiST or SP-GiST. In this case I think you should carefully estimate your capabilities during single GSoC project. You would need to extend GiST or SP-GiST interface and write completely new implementation of tree building algorithm. Question of how to exactly extend GiST or SP-GiST interface for this case could appear to be very hard even theoretically.

In short my opinion is so: don't expect miracle by just implementing GiST or SP-GiST interface functions.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 9.3 Beta1 status report
Next
From: Kevin Grittner
Date:
Subject: Re: REFRESH MATERIALIZED VIEW command in PL block hitting Assert