Cvss fall2011

From cvss

Schedule Fall 2011[edit]

Date Speaker Title
September 8 Vishal Patel Wavelets with Composite Dilations
September 15 Radu Dondera Kernel PLS Regression for Robust Monocular Pose Estimation
September 22 Dave Shaw Regularization and Localization for Prediction on Manifolds
September 29 Douglas Summers-Stay (room 3165) Scene Classification with Visual Filters
October 6 Arpit Jain Learning What and How of Contextual Models for Scene Labeling
October 13 Yi-Chen Chen Rotation Invariant Simultaneous Clustering and Dictionary Learning
October 20 Anne Jorstad Deformation and Lighting Insensitive Face Recognition: From Optical Flow to Geodesics
October 27 Garrett Warnell Compressive Sensing in Visual Tracking
November 3 Abhishek Sharma Cross-Modal Classification and Retrieval: Techniques and Challenges
November 10 Cheuk Yiu Ip Saliency-Assisted Navigation of Very Large Landscape Images
November 17 (no meeting, CVPR deadline 11/21)
November 24 (no meeting, Thanksgiving)
December 1 Ming-Yu Liu Entropy Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint
December 8 Nitesh Shroff Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds
December 15 (no meeting, final exams)


Talk Abstracts Fall 2011[edit]

Wavelets with Composite Dilations[edit]

Speaker: Vishal Patel -- Date: September 8, 2011

Sparse representation of visual information lies at the foundation of many image processing applications, such as image restoration and compression. It is well known that wavelets provide a very sparse representation for a large class of signals and images. For instance, from a continuous perspective, wavelets can be shown to sparsely represent one-dimensional signals that are smooth away from point discontinuities. Unfortunately, separable wavelet transforms have some limitations in higher dimensions. For this reason, in recent years there has been considerable interest in obtaining directionally-oriented image decompositions. Wavelets with composite dilations offer a general and especially effective framework for the construction of such representations. In this talk, I will discuss the theory and implementation of several recently introduced multiscale directional transforms. Then, I will present a new general scheme for creating an M-channel directional filter bank. An advantage of an M-channel directional filter bank is that it can project the image directly onto the desired basis. Applications in image denoising, deconvolution and image enhancement will be presented.

Kernel PLS Regression for Robust Monocular Pose Estimation[edit]

Speaker: Radu Dondera -- Date: September 15, 2011

We evaluate the robustness of five regression techniques for monocular 3D pose estimation. While most of the discriminative pose estimation methods focus on overcoming the fundamental problem of insufficient training data, we are interested in characterizing performance improvement for increasingly large training sets. Commercially available rendering software allows us to efficiently generate large numbers of realistic images of poses from diverse actions. Inspired by recent work in human detection, we apply PLS and kPLS regression to pose estimation. We observe that kPLS regression incrementally approximates GP regression using the strongest nonlinear correlations between image features and pose. This provides robustness, and our experiments show kPLS regression is more robust than two GP-based state-of-the-art methods for pose estimation. We address the ambiguity problem of pose estimation by random partitioning of the pose space and report results on the HumanEva dataset.

Regularization and Localization for Prediction on Manifolds[edit]

Speaker: David Shaw -- Date: September 22, 2011

In data analysis, one is interested in using the information about the response variable contained in the predictors in the best way possible. This can lead to problems when the predictors are highly collinear, as it implies an inherent lower-dimensional structure in the data. One method of analyzing data of this form is to make the assumption that these structured dependencies arise due to the predictors lying on some implicit lower-dimensional manifold. This assumption helps solve the problem of reducing the dimension of the predictors in the interest of removing some redundant information, but it introduces the problem of analyzing the transformed data. In particular, making accurate predictions with the lower-dimensional data that can be interpreted in the higher-dimensional space can be difficult. The technique of weighted regression with regularization on the model parameters can help to overcome these issues.

Scene Classification with Visual Filters[edit]

Speaker: Douglas Summers-Stay -- Date: September 29, 2011

"Scene Classification" is the computer vision problem of labeling all the pixels in an image according to the class they fall into, such as "street," "tree," or "person." A tool we have developed here at the computer vision lab called "visual filters" uses a series of nonlinear filters to attempt to create such classification maps. I will discuss what we are doing now and how we can incorporate ideas from "deep learning" to improve this in the future. An introduction for beginners with some examples is here.

Learning What and How of Contextual Models for Scene Labeling[edit]

Speaker: Arpit Jain -- Date: October 6, 2011

In this talk I will discuss about a data-driven approach to predict the importance of edges and construct a Markov network for image analysis based on statistical models of global and local image features. Most of the previous approaches used either a fixed fully connected Markov Network(MN) or ad-hoc neighborhood connected MN. But not edges in MN are useful and this is what I will show during my talk. I will also address the coupled problem of predicting the feature weights associated with each edge of a Markov network for evaluation of context. Experimental results indicate that this scene dependent structure construction model eliminates spurious edges and improves performance over fully-connected and neighborhood connected Markov network.

Rotation Invariant Simultaneous Clustering and Dictionary Learning[edit]

Speaker: Yi-Chen Chen -- Date: October 13, 2011

We present an approach that simultaneously clusters database members and learns dictionaries from the clusters. The method learns dictionaries in the Radon transform domain, while clustering in the image domain. The main feature of the proposed approach is that it provides rotation invariant clustering which is useful in Content Based Image Retrieval (CBIR). We demonstrate through experimental results that the proposed rotation invariant clustering provides good retrieval performance than the standard Gabor-based method that has similar objectives.

Deformation and Lighting Insensitive Face Recognition: From Optical Flow to Geodesics[edit]

Speaker: Anne Jorstad -- Date: October 20, 2011

We seek to solve the face identification problem across variations in expression and lighting together in a single framework. In order to understand variations in expression, a dense correspondence between images must be found, leading to algorithms similar to Optical Flow. We present a new lighting-insensitive metric to drive this Optical Flow-like framework. An extension of this work to the manifold of face images is then proposed, where a curve on the manifold represents the way a face might morph through time, allowing pixels to vary slowly as properties of the face change. The length of the geodesic connecting a pair of faces defines their similarity for nearest neighbor matching.

Compressive Sensing in Visual Tracking[edit]

Speaker: Garrett Warnell -- Date: October 27, 2011

Visual tracking is a classical computer vision task. However, the ubiquity of modern sensors makes it more difficult due to the large amount of data available for processing. The emerging theory of compressive sensing has the potential to address this problem in that it promises the ability to reduce the amount of data collected without sacrificing the amount of information within. In this talk, I will review recent research research toward the adaptation of some computer vision algorithms commonly used in visual tracking such that they can operate in the lower-dimensional compressive domain. Specifically, background subtraction and particle filtering will be discussed.

Cross-Modal Classification and Retrieval: Techniques and Challenges[edit]

Speaker: Abhishek Sharma -- Date: November 3, 2011

Classification data arrives in multiple forms of representations and distributions (modality) having a common underlying content. Classification or Retrieval is required to be done solely based on the content irrespective of the modality. For example - given a text description of a topic (history) find appropriate images from a database OR given a person's face image in some pose and lighting which is different than that of the gallery, find the matching face OR based on user supplied tags find matching images from the database. These problems are finding applications everywhere because of wide-spread Internet and extremely cheap sensors (cameras and keyboards).

In this talk, we will go over some popular techniques from the literature to tackle the problem of cross-modal classification and retrieval. Specifically, I will be discussing Canonical Correlational Analysis (and variants), Partial Least Square, Bilinear Model (Freeman and Tannenbaum), Tied Factor Analysis, Probabilstic LDA, Multi-view LDA and SVM-2k along with detailed pros and cons of each of these. Then I will present a comparative application of all these approaches along with recent methods for pose and lighting invariant face recognition as a case study.

Saliency-Assisted Navigation of Very Large Landscape Images[edit]

Speaker: Cheuk Yiu Ip -- Date: November 10, 2011

This work presents the first steps towards navigation of very large images, particularly landscape images, from an interactive visualization perspective. The grand challenge in navigation of very large images is identifying regions of potential interest. We outline a three-step approach. We show that our approach of progressive elicitation is fast and allows rapid identification of regions of interest. Our approach is scalable and computationally reasonable on very large images. We validate the results of our approach by comparing them to user-tagged regions of interest on several very large landscape images from the Internet.

Entropy Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint[edit]

Speaker: Ming-Yu Liu -- Date: December 1, 2011

We propose a new objective function for clustering. This objective function consists of two components: the entropy rate of a random walk on a graph and a balancing term. The entropy rate favors the formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes and penalizes larger clusters that aggressively group samples. We present a novel graph construction for the graph associated with the data and show that this construction induces a matroid-- a combinatorial structure that generalizes the concept of linear independence in vector spaces. The clustering result is given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting the submodular and monotonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of 1/2 for the optimality of the greedy solution. We validate the proposed algorithm on various benchmarks and show its competitive performances with respect to popular clustering algorithms. We further apply it for the task of superpixel segmentation. Experiments on the Berkeley segmentation dataset reveal its superior performances over the state-of-the-art superpixel segmentation algorithms in all the standard evaluation metrics.

Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds[edit]

Speaker: Nitesh Shroff -- Date: December 8, 2011

In this talk, we will consider the Precis problem of sampling K representative yet diverse data points from a large dataset. This problem arises frequently in applications such as video and document summarization, exploratory data analysis, and pre-filtering. We will formulate a general theory which encompasses not just traditional techniques devised for vector spaces, but also non-Euclidean manifolds, thereby enabling these techniques to shapes, human activities, textures and many other image and video based datasets. We will propose intrinsic manifold measures for measuring the quality of a selection of points with respect to their representative power, and their diversity. We will then propose efficient algorithms to optimize the cost function using a novel annealing-based iterative alternation algorithm. The proposed formulation is applicable to manifolds of known geometry as well as to manifolds whose geometry needs to be estimated from samples.