===Clustering Images with Algorithms and Humans===
Speaker: [http://www.umiacs.umd.edu/~arijit/ Arijit Biswas] -- Date: December 13, 2012
Speaker: [http://www.umiacs.umd.edu/~arijit/ Arijit Biswas] -- Date: December 13, 2012
+
+
First, we propose a method of clustering images that combines algorithmic and human input. An algorithm provides us with pairwise image similarities. We then actively obtain selected, more accurate pairwise similarities from humans. A novel method is developed to choose the most useful pairs to show a person, obtaining constraints that improve clustering. In a clustering assignment elements in each data pair are either in the same cluster or in different clusters. We simulate inverting these pairwise relations and see how that affects the overall clustering. We choose a pair that maximizes the expected change in the clustering. The proposed algorithm has high time complexity, so we also propose a version of this algorithm that is much faster and exactly replicates our original algorithm. We further improve run-time by adding heuristics, and show that these do not significantly impact the effectiveness of our method. We have run experiments in two different domains, namely leaf images and face images, and show that clustering performance can be improved significantly.
+
+
Second, we define a new clustering problem called subclustering and propose passive and active subclustering algorithms. Although there are many excellent clustering algorithms, effective clustering remains very challenging for large datasets that contain many classes. Image clustering presents further problems because automatically computed image distances are often noisy. We address these challenges in two ways. First, we propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters. Then we make use of human input in an active subclustering algorithm to further improve results. We run experiments on a face image dataset (having 51,418 images from 200 classes) and a leaf image dataset and show that our proposed algorithms perform better than baseline methods.