Machine Learning Reading Group, Spring 2013
University of Maryland, College Park
Time: Thursdays, 3:00pm - 4:15pm
Location: AVW 4103 (except Mar 28, in 3258)
Topic: Spectral Methods
Mailing List: firstname.lastname@example.org (listinfo)
- Hal Daumé III, Assistant Professor (CS/Ling/UMIACS)
- Michael Bloodgood, Assistant Research Scientist (CASL)
- Greg Sanders, 3rd year PhD Student (CS/UMIACS/CLIP)
- Jiarong Jiang, PhD student (CS/UMIACS)
- Taesun Moon, Postdoc, (UMIACS)
Clustering Large Graphs via the Singular Value Decomposition
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Machine Learning, 56, 9-33, 2004.
Big picture summary of paper:
Discrete Clustering Problem (DCP): Consider partitioning a set of m points in n-dimensional space into k clusters (usually m and n vary, while k is fixed) so as to minimize the sum of squared distances between each point and its cluster center.
They prove that the DCP is NP-hard even for k=2.
Consider a continuous relaxation of the DCP, which they call Continuous Clustering Problem (CCP): find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points.
The CCP can be solved by computing the SVD of the m x n matrix that represents the m points.
They prove the solution to the CCP can be used to get a 2-approximation algorithm for the DCP.
Also, they discuss why the solution to the CCP is useful in its own right.
They state a problem still remains: for very large matrices, computing the SVD is too expensive to be used in practice.
Main contribution of paper: They present a solution: the SVD of a submatrix chosen according to a suitable probability distribution provides an approximation to the SVD of the whole matrix, yielding a linear time randomized algorithm. The authors expect this algorithm to be the main contribution of this paper, since the algorithm can be applied to very large matrices that arise in modern applications.
McSherry - Spectral partitioning of random graphs
Von Luxburg - A tutorial on spectral clustering
Chaudhuri-Chung-Tsiatas - Spectral clustering of graphs with general degrees in the extended planted partition model
Bshouty-Long - Finding planted partitions in nearly linear time using arrested spectral clustering