# MatrixOps

## Contents

# Pointers and Readings

## "Required" readings and software

- Gradient descent and Stochastic gradient descent algorithms
- Singular value decomposition
- Generalized Hebbian algorithm for SVD
- Code for the above
- Randomized algorithms for large scale matrix stuff, tutorial (or there's a video of the talk)
- Code related to the above (I hope you can read Fortran!!! Hah)
- Python code for randomized matrix stuff (similar to above)
- Approximate matrix multiplication

## "Might-be-interesting" readings

- Randomized Online PCA and a kernelized version
- Latent semantic analysis for text documents
- Part of speech tagging with SVD
- Learning bilingual lexicons from monolingual data
- Spectral clustering

# Goals

- Given K (n*n), compute svd(center(K)) using Hebbian algorithm
- Given X (n*d), compute svd(center(X)) using Hebbian algorithm
- Given A (n*d), compute center(A)*Omega, for Omega random d*p using exact matrix multiplication
- Repeat (1) using randomized algorithm
- Repeat (2) using randomized algorithm
- Given A,B (n*d), compute svd(inv(CC') * DD')), where C=center(A) and D=center(B) if this can be done more efficiently than the naive way of doing it
- Given X (n*d) and l, computer svd(A'A+lI) where A=center(X)
- Given X (n*d), compute svd(H DD' H'), where D = center(X) and H=inv(I - 11'/n), if this can be done more efficiently than the naive way of doing it

In all the above, "center(X)" means X moved so that it has mean zero. So if X_{i,:} is the ith row of X, then center(X)_{i,:} = X_{i,:} - mean(X). Similarly, by svd(A), we mean the top k singular values and left singular vectors of A, for some user-specified k.

In all cases, we should have the option of either handing over data in a dense format or a sparse format, and the implementation should use appropriately modified routines depending on the data type.

From a timing perspective, I would love to get #1 implemented by mid February. I would be thrilled to get #1-#6 done by the end of the semester, but would still be very happy to get #1-#5 done in that time frame.

# Desired Interface

The goal would be to have a program, say "svd" that would allow me to select from the following options, depending on my application:

- Do I want to run the Hebbian version or the randomized version (and if the latter, what value of "p" oversampling should I use)
- Have I pre-centered the data, or should the algorithm do that for me?
- Do I want to use a sparse matrix representation or a dense matrix representation?

The input for a sparse representation should be one line per row of the input matrix, which looks like "3:9.1 10:0.03 62:-5.2", meaning that all columns are zero except the 3rd, 10th and 62nd, which have the given values. (This is the same file format as used by SVM-light, libSVM and Vowpal Wabbit.)

The input for dense representation should just be the obvious one line per row, space separated columns.

# Implementation Notes

It seems worthwhile to try to implement things on top of the basic infrastructure of Vowpal Wabbit. The two cool things that this will do are:

- Spawn two threads, one for doing file IO and one for doing computation. (Note that the VW file format is basically the same as our file format, though a bit more flexible.)
*Note: we would need to remove feature hashing, or at least turn it into a user-level option*

- All these algorithms will require multiple passes over the data (since they're not allowed to store it in memory). On the first pass over the data, VW will create a "cache" file, which is essentially a very quick-to-read compressed version of the input. Subsequent passes will read this file to reduce the IO bottleneck

# Progress Reports

None.

# Data

None.

# Questions for the Crew

None.