[GSoC] Clustering in MADlib - status update - Mailing list pgsql-hackers

From Maxence Ahlouche
Subject [GSoC] Clustering in MADlib - status update
Date
Msg-id CAJeaomUZfGXKyvUB4-6yxK5m+dVMLd+w+5DEm5MbYf2kErB0XA@mail.gmail.com
Whole thread Raw
Responses Re: [GSoC] Clustering in MADlib - status update  (Maxence Ahlouche <maxence.ahlouche@gmail.com>)
List pgsql-hackers
Hi,

Here is my first report. You can also find it on my Gitlab [0].

Week 1 - 2014/05/25

For this first week, I have written a test script that generates some simple datasets, and produces an image containing the output of the MADlib clustering algorithms.

This script can be called like this:

./clustering_test.py new ds0 -n 8 # generates a dataset called "ds0" with 8 clusters
./clustering_test.py query ds0 -o output.png # outputs the result of the clustering algorithms applied to ds0 in output.png

See ./clustering_test.py -h for all the available options.

An example of output can be found here [1].

Of course, I will keep improving this test script, as it is still far from perfect; but for now, it does approximately what I want.

For next week, I'll start working on the implementation of k-medoids in MADlib. As a reminder, according to the timeline I suggested for the project, this step must be done on May 30. Depending on the problems I will face (mostly lack of knowledge of the codebase, I guess), this might not be finished on time, but it should be done a few days later (by the end of next week, hopefully).

Attached is the patch containing everything I have done this week, though the git log might be more convenient to read.

Regards,

Maxence A.

[0] http://git.viod.eu/viod/gsoc_2014/blob/master/reports.rst
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Sending out a request for more buildfarm animals?
Next
From: Andres Freund
Date:
Subject: Re: pg_upgrade fails: Mismatch of relation OID in database 8.4 -> 9.3