# Spring13

**Machine Learning Reading Group, Spring 2013**

University of Maryland, College Park

## Contents

# Details

**Time:** Thursdays, 3:00pm - 4:15pm

**Location:** AVW 4103 (except Mar 28, in 3258)

**Topic:** Spectral Methods

**Mailing List:** mlrg@cs.umd.edu (listinfo)

# Participants

- Hal Daumé III, Assistant Professor (CS/Ling/UMIACS)

- Michael Bloodgood, Assistant Research Scientist (CASL)

- Greg Sanders, 3rd year PhD Student (CS/UMIACS/CLIP)

- Jiarong Jiang, PhD student (CS/UMIACS)

- Taesun Moon, Postdoc, (UMIACS)

# Schedule

# Paper Summaries

## Clustering Large Graphs via the Singular Value Decomposition

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Machine Learning, 56, 9-33, 2004.

Big picture summary of paper:

Discrete Clustering Problem (DCP): Consider partitioning a set of m points in n-dimensional space into k clusters (usually m and n vary, while k is fixed) so as to minimize the sum of squared distances between each point and its cluster center.

They prove that the DCP is NP-hard even for k=2.

Consider a continuous relaxation of the DCP, which they call Continuous Clustering Problem (CCP): find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points.

The CCP can be solved by computing the SVD of the m x n matrix that represents the m points.

They prove the solution to the CCP can be used to get a 2-approximation algorithm for the DCP.

Also, they discuss why the solution to the CCP is useful in its own right.

They state a problem still remains: for very large matrices, computing the SVD is too expensive to be used in practice.

Main contribution of paper: They present a solution: the SVD of a submatrix chosen according to a suitable probability distribution provides an approximation to the SVD of the whole matrix, yielding a linear time randomized algorithm. The authors expect this algorithm to be the main contribution of this paper, since the algorithm can be applied to very large matrices that arise in modern applications.

McSherry - Spectral partitioning of random graphs

Von Luxburg - A tutorial on spectral clustering

Chaudhuri-Chung-Tsiatas - Spectral clustering of graphs with general degrees in the extended planted partition model

Bshouty-Long - Finding planted partitions in nearly linear time using arrested spectral clustering