Cvss spring2013

From cvss

Schedule Spring 2013[edit]

All talks take place Thursdays at 4:30pm in AVW 3450.

Date Speaker Title
January 31 Mohammad Rastegari Scalable object-class retrieval with approximate and top-k ranking
February 7 Angjoo Kanazawa Dog Breed Classification Using Part Localization
February 14 Stephen Xi Chen Piecing Together the Segmentation Jigsaw using Context
February 21 Guangxiao Zhang Discriminative Dictionary Learning for Sparse Coding : A Batch Version and a Semi-Supervised Online Version
February 28 Kota Hara Boosted Regression Tree and its Application to Computer Vision
March 7 Prof. Mohand Said Allili (University of Quebec/Canada) Statistical Multi-Scale Decomposition Modeling of Texture and Applications
March 14 Arijit Biswas Attributes for Classifier Feedback
March 21 (Spring Break, no meeting)
March 26 Amir R. Zamir Large Scale Image and Video Geo-localization Using Street View Imagery
April 4 (Midterms, no meeting)
April 11 (ICCV deadline, no meeting)
April 18 Sumit Shekhar Domain-Adaptive Dictionaries
April 25 Mohsen Hejrati Analyzing 3D objects in cluttered images
May 2 Xavier Gibert Serra Prediction of tissue properties from PET/SPECT cardiac images
May 9 Raviteja Vemulapalli

Jonghyun Choi

Kernel Learning for Extrinsic Classification of Manifold Features

Adding Unlabeled Samples to Categories by Learned Attributes

Talk Abstracts Spring 2013[edit]

Scalable object-class retrieval with approximate and top-k ranking[edit]

Speaker: Mohammad Rastegari -- Date: January 31, 2013

In this paper we address the problem of object-class retrieval in large image data sets: given a small set of training examples defining a visual category, the objective is to efficiently retrieve images of the same class from a large database. We propose two contrasting retrieval schemes achieving good accuracy and high efficiency. The first exploits sparse classification models expressed as linear combinations of a small number of features. These sparse models can be efficiently evaluated using inverted file indexing. Furthermore, we introduce a novel ranking procedure that provides a significant speedup over inverted file indexing when the goal is restricted to finding the top-k (i.e., the k highest ranked) images in the data set. We contrast these sparse retrieval models with a second scheme based on approximate ranking using vector quantization. Experimental results show that our algorithms for object-class retrieval can search a 10 million database in just a couple of seconds and produce categorization accuracy comparable to the best known class-recognition systems.

Dog Breed Classification Using Part Localization[edit]

Speaker: Angjoo Kanazawa -- Date: February 7, 2013

We propose a novel approach to fine-grained image classification in which instances from different classes share common parts but have wide variation in shape and appearance. We use dog breed identification as a test case to show that extracting corresponding parts improves classification performance. This domain is especially challenging since the appearance of corresponding parts can vary dramatically, e.g., the faces of bulldogs and beagles are very different. To find accurate correspondences, we build exemplar-based geometric and appearance models of dog breeds and their face parts. Part correspondence allows us to extract and compare descriptors in like image locations. Our approach also features a hierarchy of parts (e.g., face and eyes) and breed-specific part localization. We achieve 67% recognition rate on a large real-world dataset including 133 dog breeds and 8,351 images, and experimental results show that accurate part localization significantly increases classification performance compared to state-of-the-art approaches.

Piecing Together the Segmentation Jigsaw using Context[edit]

Speaker: Stephen Xi Chen -- Date: February 14, 2013

We present an approach to jointly solve the segmentation and recognition problem using a multiple segmentation framework. We formulate the problem as segment selection from a pool of segments, assigning each selected segment a class label. Previous multiple segmentation approaches used local appearance matching to select segments in a greedy manner. In contrast, our approach formulates a cost function based on contextual information in conjunction with appearance matching. This relaxed cost function formulation is minimized using an efficient quadratic programming solver and an approximate solution is obtained by discretizing the relaxed solution. Our approach improves labeling performance compared to other segmentation based recognition approaches.

Discriminative Dictionary Learning for Sparse Coding : A Batch Version and a Semi-Supervised Online Version[edit]

Speaker: Guangxiao Zhang -- Date: February 21, 2013

Dictionary learning has been a hot topic in recent years in computer vision. Many dictionary learning strategies have been proposed and led to excellent results in image denoising, inpainting, and recognition. However, most of them optimizes for reconstruction and therefore may not be the best for classification. Moreover, such dictionaries are often over-complete (more dictionary items than the dimension), which is computationally costly when the number of categories is large.

In the first part of the talk, we present a greedy learning algorithm to obtain a compact (small-sized) and discriminative dictionary. Starting with an over-complete dictionary, we map the dictionary items with label into an undirected k-nearest neighbor graph, and model the discriminative dictionary learning as a graph topology selection problem. By optimizing a monotonic, submodular objective function, our algorithm is shown to be highly efficient and effective in face recognition, object recognition, and action/gesture classification tasks.

In the second part of the talk, we present an online, semi-supervised dictionary learning algorithm that is suitable when the size of the dataset is large and the labels are expensive to obtain. Similar to the previous work, our goal is to obtain a dictionary which is both representative and discriminative. Besides learning from labeled data, we also exploit the large amount of cheap, unlabeled training data to reinforce the representation power. An online framework makes the algorithm applicable to large-scale dataset.

Boosted Regression Tree and its Application to Computer Vision[edit]

Speaker: Kota Hara -- Date: February 28, 2013

Boosting techniques have been applied to various computer vision tasks, however, most of them are used for classification purposes. In this talk, I will first present Boosted Regression Tree (BRT), a regression version of the boosting used with regression trees, and its simple extension to multidimensional output regression. Then I will show the applications of the BRT on three computer vision tasks, head pose estimation, human pose estimation and class-specific object shape estimation. The BRT is applied to the head pose estimation task straightforwardly. In the human pose estimation task, a body pose estimation task is divided into a set of local pose estimation tasks that maintain a dependency structure and the BRT is used to solve those local pose estimation tasks in a successive manner. Lastly, the class-specific object shape estimation task is addressed by introducing a low dimensional space representing the object shape manifold and using it to bridge the image feature space and the original object shape space.

Statistical Multi-Scale Decomposition Modeling of Texture and Applications[edit]

Speaker: Prof. Mohand Said Allili (University of Quebec/Canada): March 7, 2013

Texture modeling and representation has been the subject of interest of several research works in the last decades. This presentation will focus on texture representation using multi-scale decompositions. More specifically, a new statistical framework, based on finite mixtures of Generalized Gaussian (MoGG) distributions, will be presented to model the distribution of multi-scale wavelet/contourlet decomposition coefficients of texture images. After a brief review of the state of the art about wavelet/contourlet statistical modeling, details about parameter estimation of the MoGG model will be presented. Then, two applications will be shown for the proposed approach, namely: wavelet/contourlet-based texture classification and retrieval, and fabric texture defect detection. Experimental results with comparison to recent state of the art methods will be presented as well.

Attributes for Classifier Feedback[edit]

Speaker: Arijit Biswas : March 14, 2013

Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. "I think this is a forest, what do you think?. If the learner is wrong, the supervisor provides an explanation e.g. "No, this is too open to be a forest. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this talk, we talk about three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.

Large Scale Image and Video Geo-localization Using Street View Imagery[edit]

Speaker: Amir R. Zamir : March 26, 2013

There has been a growing interest in large scale visual geo-localization methods which utilize structured reference datasets such as Street View. In this talk, we present two frameworks for geo-locating images and videos in a city scale with an accuracy comparable to hand-held GPS devices.

We extract SIFT features from the reference dataset and index them in a data structure tree to make a timely search feasible. To geo-locate a query image, we match its SIFT features against the reference tree and remove the unreliable correspondences using our geo-spatial pruning which incorporates GPS information in the pruning process. We smooth the distribution function formed using the location of the SIFT matches by a two-dimensional Gaussian to utilize the correspondences from nearby places and find the accurate location of the query image.

Additionally, we present a method for extracting the geo-spatial trajectory of a moving camera in a city from videos in the wild such as typical YouTube clips. First, we divide the video into smaller segments and localize each one individually. Then, we fuse the information from different segments utilizing a Bayesian formulation to have a temporally consistent trajectory. Lastly, we perform a post processing by a novel non-model-based trajectory reconstruction method based on Minimum Spanning trees; we argue that such post processing is essential for addressing the problems that the basic Bayesian formulation faces due to having a predefined underlying motion model, while the motion of camera in wild videos does not necessary follow any pattern.

Analyzing 3D objects in cluttered images[edit]

Speaker: Sumit Shekhar : April 18, 2013

Data-driven dictionaries have produced state-of-the-art results in various classification tasks. However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. In this talk, I will discuss a technique to learn a joint dictionary which can work well for the target data as well, and present some results on face and object recognition.

Analyzing 3D objects in cluttered images[edit]

Speaker: Mohsen Hejrati : April 25, 2013

We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage framework; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset.

Prediction of tissue properties from PET/SPECT cardiac images[edit]

Speaker: Xavier Gibert Serra : May 2, 2013

Implantable cardioverter defibrillators (ICDs) deliver shocks in response to left ventricular tachycardia (VT), usually caused by anomalous electrical conduction pathways within scar tissue. About 17,000 patients per year in the U.S. with hemodynamically significant and recurrent VT require radiofrequency ablation. In patients with structural heart disease, electrophysiological (EP) voltage mapping of the endocardial/epicardial surface with a catheter-based system identifies scar areas immediately prior to ablation, which isolates slow conducting channel regions. About half of patients during 6-month follow-up have recurrent incessant or intermittent VT, thereby indicating a need to improve ablation effectiveness.

Nuclear medicine imaging techniques have shown some success in predicting electrophysiology(EP)-derived tissue properties (scar, border zone, normal) from PET/SPECT cardiac images to aid EP ablation procedures for left ventricular tachycardia (VT). However, current procedures are based on subjective evaluations by human analysts and are prone to error. Existing medical image processing techniques are insufficient to provide reliable predictions due to limitations in image resolution, presence of outliers, and lack of an closed-form relation betwen PET and EP. In collaboration with the University of Maryland Medical Center in Baltimore, we are addressing the following problems:

1) Integrated visualization of 3-D PET/SPECT cardiac images and discretely sampled EP voltage measurements.

2) Automated registration of cardiac PET/SPECT images with EP voltage data.

3) Robust sparse regression methods with outlier rejection.

Kernel Learning for Extrinsic Classification of Manifold Features[edit]

Speaker: Raviteja Vemulapalli : May 9, 2013

For features that lie in Euclidean spaces, classifiers based on discriminative approaches such as linear discriminant analysis (LDA), partial least squares (PLS) and support vector machines (SVM) have been successfully used in various applications. However, these techniques are not directly applicable to features that lie on Riemannian manifolds. One possible solution to this problem is to define kernels on the manifolds. In this talk I will discuss about kernels and multiple kernel learning focusing on Grassmann manifold and the manifold of symmetric positive definite matrices.

Adding Unlabeled Samples to Categories by Learned Attributes[edit]

Speaker: Jonghyun Choi : May 9, 2013

We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes. Our optimization formulation discovers category specific attributes as well as the images that have high confidence in terms of the attributes. In addition, we propose a method to stably capture example-specific attributes for a small sized training set. Our method adds images to a category from a large unlabeled image pool, and leads to significant improvement in category recognition accuracy evaluated on a subset of a large-scale dataset, ImageNet.