GSoC project: K-medoids clustering in Madlib - Mailing list pgsql-students

From Maxence AHLOUCHE
Subject GSoC project: K-medoids clustering in Madlib
Date
Msg-id CAJeaomXH1x3SaenmRPdWho9POMBZZmTsbM-iGJm03sH35BKYnQ@mail.gmail.com
Whole thread Raw
List pgsql-students
Hi again!

I am "viod.len@gmail.com", but now writing with my "true" email address.

Note that I've sent this mail to both MADlib and PostgreSQL mailing lists, in order to synchronize the efforts. I've also sent it to everyone that was CC'ed in previous mails.
Where should I send mails regarding this project from now on? Sending on both mailing lists seems like a quite bad idea.

As I had lost hope of getting an answer from MADlib, I have recontacted Atri, as he was willing to mentor the MADlib projects.

Here is his answer:

Le 15 avr. 2013 17:46, "Atri Sharma" <atri.jiit@gmail.com> a écrit :
On Mon, Apr 15, 2013 at 9:12 PM, viod <viod.len@gmail.com> wrote:
> Hello all!
>
>
>>
>> Do you have any interest in data analytics? I proposed a couple of
>> ideas in that field.If you are interested, we could talk over them.
>>
>> Regards,
>>
>> Atri
>
>
> I am pretty much interested in the ideas you posted on the mailing list
> (particularly in implementing the K-medoids algorithm). I've asked MADlib if
> they could mentor me, but unfortunately  their org has not been accepted in
> GSoC, and they haven't answered since then.


No issues, I can try to help you out.

> Do you think I could still do it with PostgreSQL? I would really like to do
> this project, as an initiation to classification algorithms.
>
> Still, I don't really understand how this would be used by PostgreSQL?

MADLib is the de facto library for in database analytics in PostgreSQL.

Download,install MADLib, run a few programs, and think more about the
implementation and discuss here.

Regards,

Atri

And here is Rahul's message:

Hi Maxence, 

Welcome aboard on MADlib development. We would happy to help you out in adding K-mediods to the MADlib suite. 

Do you have any idea of stuff I could do to get familiar with the code?
I've already been through the doc to search for functions I already knew,
and found a little bit, I'll go and read the code by the end of the week.
You could start off my looking the the Linear Regression code to understand the workflow. Another document that would be useful to review is the design doc (found here). Chapter 1 gives an overview of the Abstraction layer that is used by all modules. 
(there are some bibtex errors that I debugging, but it's still readable)

Linear regression does not use the iterative constructs and is easier to understand. You could then look at how k-means is implemented since k-mediods would interplay with it in the final product. 
I believe once we go through the k-mediods implementation, extending the backprop code would be easy. So we could definitely look at that when we get to it. 

Sometimes we get busy enough to not be able to give a quick response on the devel@madlib list but keep posting questions there (or ping when you don't hear back) and we will be able to support you in this endeavor. 

Best, 
Rahul

By the way, my "ping" message didn't intend to look aggressive -- just in case. My phone simply ate my question mark.

I'm also adding the presentation I had sent on MADlib's mailing list, so that PostgreSQL's guys can also get a better idea of who I am:

I'm Maxence Ahlouche, and have now been studying IT for almost three years. I've spent the first two years of my studies in the French equivalent of an HND, a very technical training. After having obtained my diploma, I've integrated an engineering school, as I wanted to learn more theorical stuff, and understand better the tools I use every day.
My current training is actually called IT and Applied Mathematics (and I currently have some difficulties in mathematics, as all the other students, except for one, have done a very maths-intensive "preparatory course" before coming here). Still, I'm really interested into what I learn, and am very curious about many things.

At first, I wanted to apply for a PostgreSQL project, and, while lurking on their mailing list, I found a reference to the aforementioned project about K-medoids algorithm. I found this project in perfect fit with my centers of interest: a teacher made me love databases and want to learn more about their internals, and machine learning is a domain that's been attracting me for a while now.

As to my skills, I've learnt lots of programming languages (not exhaustive list: C, C++, Java, a bit of Matlab and Fortran, Bash, PHP, C#, VBA, Python, Caml...). I know how to learn by myself and quickly. During my courses, I've done a (very little) bit of classification: we had to determine the zone in which a pixel belongs via their maximum likelihood. This made me want to learn more about this domain.

That being said, I thank you all for your investment :)

--
Maxence Ahlouche
06 06 66 97 00
93 avenue Paul DOUMER
24100 Bergerac

pgsql-students by date:

Previous
From: viod
Date:
Subject: Re: Google Summer of code 2013
Next
From: "Karel K. Rozhoň"
Date:
Subject: Google Summer of Code 2013