Language Edit Distance, (min,+)-Matrix Multiplication & Beyond
Barna Saha received her Ph.D. from the University of Maryland College Park, and then spent a couple of years at the AT&T Shannon Labs as a senior researcher before joining UMass Amherst in 2014. Her research interests are in theoretical computer science specifically in algorithm design and analysis, and large scale data analytics. She is the recipient of Google Faculty Award (2016), Yahoo ACE Award (2015), Simons-Berkeley Research Fellowship (2015), NSF CRII Award (2015), Dean's Fellowship from UMD (2011), and the best paper award and finalists for best papers at VLDB 2009 and ICDE 2012 respectively.
The language edit distance is a significant generalization of two basic problems in computer science: parsing and string edit distance computation. Given any context free grammar, it computes the minimum number of insertions, deletions and substitutions required to convert a given input string into a valid member of the language. In 1972, Aho and Peterson gave a dynamic programming algorithm that solves this problem in time cubic in the string length. Despite its vast number of applications, in forty years there has been no improvement over this running time.
Computing (min,+)-product of two n by n matrices in truly subcubic time is an outstanding open question, as it is equivalent to the famous All-Pairs-Shortest-Paths problem. Even when matrices have entries bounded in [1,n], obtaining a truly subcubic (min,+)-product algorithm will be a major breakthrough in computer science.
In this presentation, I will explore the connection between these two problems which led us to develop the first truly subcubic algorithms for the following problems: (1) language edit distance, (2) RNA-folding-a basic computational biology problem and a special case of language edit distance computation, (3) stochastic grammar parsing—fundamental to natural language processing, and (4) (min,+)-product of integer matrices with entries bounded in n(3-ω-c) where c >0 is any constant and, ω is the exponent of the fast matrix multiplication, widely believed to be 2.
Time permitting, we will also discuss developing highly efficient linear time approximation algorithms for language edit distance for important subclasses of context free grammars.