Our new product, IntuitionHQ, shows clusters of clicks on an image. To generate these clusters we made use of a gem called Hierclust. The great thing about this gem is it’s simplicity – just input the points and a minimum cluster separation, and out come the clusters.
The problem with Hierclust was the performance. With fewer than 100 points to cluster Hierclust was running too slow to do it dynamically. This was no problem, we moved the clustering program into a cronjob and stored the data in a marshalled file.
However, in testing we found that Hierclust was still too slow. Once we had over 200 points being clustered it started taking minutes to process – an unsustainable amount of time for the data we expected. The graph below shows the timings, which I believe is O(n3). We had to disable cluster processing while looking at the problem due to issues it was causing on the server.

Graph of points v time taken


We looked at several commercial packages, but these turned out to be either too expensive, or not flexible enough. There is a free component a vailable at 