Could you draw up a summary, giving your findings about the performance of different algorithms,and which one should be implemented,or both(k means++ vs k medoids).
Regards,
Atri
From the few articles I've already read, I've found that K-medoids clustering usually goes faster on standard datasets such as the ones I generate). But I'll look for more detailed information during the week, and report what I'll have found here!
By the way, have you got any idea of other forms of datasets that could be useful to test?
May I suggest generating a visualization in a web toolkit? Perhaps the new vega library would be simplest (http://trifacta.github.io/vega/) or the more popular but lower-level D3.js?
More generally, a project to connect MADlib outputs to vega vis specifications seems like it would be enormously useful!
Joe
I'll give it a look during my holidays, in a week! It would indeed be nice if one just had to open a webpage to test my work!
Considering your other idea, aren't MADlib outputs PostgreSQL/GreenPlum outputs? If so, only a database connector is required, which probably already exists (I may be wrong, I had never heard of D3.js or Vega before, and I don't know well the MADlib project yet).