CATS-Sep-20-2013
Title[edit]
The Power of Localization for Active and Passive Learning of Linear Separators
Speaker[edit]
Maria Florina Balcan is an assistant professor in the School of Computer Science at Georgia Institute of Technology. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. She is a recipient of the Carnegie Mellon University SCS Distinguished Dissertation Award , an NSF CAREER Award, a Microsoft Faculty Research Fellowship, and several paper awards at COLT. She is currently a board member of the International Machine Learning Society and a Program Committee co-chair for COLT 2014.
Abstract[edit]
We analyze active learning algorithms, which only receive the classifications of examples when they ask for them, and traditional passive (PAC) learning algorithms, which receive classifications for all training examples, under log-concave and nearly log-concave distributions. By using an aggressive localization argument, we prove that active learning provides an exponential improvement over passive learning when learning homogeneous linear separators in these settings. Building on this, we then provide a computationally efficient algorithm with optimal sample complexity for passive learning in such settings. This provides the first bound for a polynomial-time algorithm that is tight for an interesting infinite class of hypothesis functions under a general class of data-distributions, and also characterizes the distribution-specific sample complexity for each distribution in the class. We also illustrate the power of localization for efficiently learning linear separators in two challenging noise models (malicious noise and agnostic setting) where we provide efficient algorithms with significantly better noise tolerance than previously known.