GSOC Student Project Idea - Mailing list pgsql-hackers

From Michael Schuh
Subject GSOC Student Project Idea
Date
Msg-id CAA43Kd3_CA08=QkO6LJ_gftKEU+AmYKvmW6WsdnP_k0mORsOAQ@mail.gmail.com
Whole thread Raw
Responses Re: GSOC Student Project Idea
Re: GSOC Student Project Idea
Re: GSOC Student Project Idea
List pgsql-hackers
Greetings,

Hello, my name is Michael Schuh and I am a PhD student in Computer Science at Montana State University. I have never participated in GSOC before, but I am very excited to propose a project to PostgreSQL that I feel would be a great follow-up to last year's project by Alexander Korotkov (http://www.google-melange.com/gsoc/project/google/gsoc2012/akorotkov/53002). I contacted Mr. Korotkov's mentor from last year, Mr. Heikki Linnakangas, and he suggested I email this mailing list with my idea.

In brief, I would like to implement a state-of-the-art indexing algorithm (named iDistance) directly in PostgreSQL using GiST or SP-GiST trees and whatever means necessary. It is an ideal follow-up to last year's project with Mr. Korotkov, which implemented classical indexing structures for range queries. I strongly believe the community would greatly benefit from the inclusion of iDistance, which has been shown to be dramatically more effective than R-trees and KD-trees, especially for knn queries and above 10-20 dimensions.

A major focus of my current PhD thesis is high-dimensional data indexing and retrieval, with an emphasis towards applied use in CBIR systems. Recently, I published work which introduced a new open source implementation of iDistance in C++ (and some Python), which I believe makes me highly qualified and motivated for this opportunity. I have been strongly considering a PostgreSQL implementation for an easy plug-and-play use in existing applications, but with academic grant funding, the priority is low. Below are links to my google code repository and recent publication. I am happy to discuss any of this in further detail if you'd like.


Although I do not have a lot of experience with PostgreSQL development, I am eager to learn and commit my summer to enabling another fantastic feature for the community. Since iDistance is a non-recursive, data-driven, space-based partitioning strategy which builds directly onto a B+-tree, I believe the implementation should be possible using only GiST support. Please let me know if this is of any interest, or if you have any additional questions. Unfortunately, I will be unavailable most of the day, but I plan to fill out the GSOC application later this evening.

Thank you for your time,
Mike Schuh





pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: 9.3 Beta1 status report
Next
From: Bruce Momjian
Date:
Subject: Re: 9.3 Beta1 status report