Here is my first report. You can also find it on my Gitlab [0].
Week 1 - 2014/05/25
For this first week, I have written a test script that generates some simple datasets, and produces an image containing the output of the MADlib clustering algorithms.
This script can be called like this:
./clustering_test.py new ds0 -n 8 # generates a dataset called "ds0" with 8 clusters
./clustering_test.py query ds0 -o output.png # outputs the result of the clustering algorithms applied to ds0 in output.png
See ./clustering_test.py -h for all the available options.
An example of output can be found here [1].
Of course, I will keep improving this test script, as it is still far from perfect; but for now, it does approximately what I want.
For next week, I'll start working on the implementation of k-medoids in MADlib. As a reminder, according to the timeline I suggested for the project, this step must be done on May 30. Depending on the problems I will face (mostly lack of knowledge of the codebase, I guess), this might not be finished on time, but it should be done a few days later (by the end of next week, hopefully).
Attached is the patch containing everything I have done this week, though the git log might be more convenient to read.
Regards,
Maxence A.