Main Page
Computer Vision Student Seminars
The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.
To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.
Description[edit]
The purpose of these talks is to:
- Encourage interaction between computer vision students;
- Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
- Provide an opportunity for computer vision students to receive feedback on their current research;
- Provide speaking opportunities for computer vision students.
The guidelines for the format are:
- An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
- The talks are meant to be casual and discussion is encouraged.
- Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
Schedule Fall 2012[edit]
All talks take place Thursdays at 4:30pm in AVW 3450.
Date | Speaker | Title |
---|---|---|
September 6 | Angjoo Kanazawa | Face Alignment by Explicit Shape Regression |
September 13 | Sameh Khamis | Combining Per-Frame and Per-Track Cues for Multi-Person Action Recognition |
September 20 | Douglas Summerstay | Artificial Intelligence and Artificial Creativity Before 1900 |
September 27 | Mohammad Rastegari | Attribute Discovery via Predictable and Discriminative Binary Codes |
October 4 | Xavier Gibert Serra | Anomaly Detection on Railway Components using Sparse Representations |
October 11 | Kotaro Hara | Using Google Street View to Identify Street-level Accessibility Problems |
October 18 | Ashish Shrivastava | Dictionary learning methods for computer vision |
October 25 | Yi-Chen Chen | Dictionary-based Face Recognition from Video |
November 1 | Mohammad Rastegari | Instance Level Multiple Instance Learning Using Similarity Preserving Quasi Cliques |
November 8 | (Midterms, no meeting) | |
November 15 | (CVPR deadline, no meeting) | |
November 22 | (Thanksgiving, no meeting) | |
November 29 | Fatemeh Mirrashed | Knowledge Adaptation in Visual Domains |
December 6 | Ejaz Ahmed | |
December 13 | Arijit Biswas | Clustering Images with Algorithms and Humans |
Talk Abstracts Fall 2012[edit]
Face Alignment by Explicit Shape Regression[edit]
Speaker: Angjoo Kanazawa -- Date: September 6, 2012
In this talk, we will go over CVPR 2012 paper "Face Alignment by Explicit Shape Regression". I will review the paper and discuss its key concepts: cascaded regression, random ferns, shape indexed image features, and correlation based feature selection. Then I will discuss our hypothesis on why this seemingly simple method works so well and how we can apply their method to similar problem domains such as dog and bird parts localization and their challenges.
Abstract from the paper: We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape-indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 minutes for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.
Combining Per-Frame and Per-Track Cues for Multi-Person Action Recognition[edit]
Speaker: Sameh Khamis -- Date: September 13, 2012
We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level (e.g., a person jogging then dancing) and the frame level (e.g., a person jogging in a dance studio). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.
Artificial Intelligence and Artificial Creativity Before 1900[edit]
Speaker: Doug Summers-Stay -- Date: September 20, 2012
I will talk about various inventions such as the Eureka, which generated Latin poetry in hexameter while playing "God Save the Queen"; the Homeoscope, a mechanical search engine invented by a Russian police clerk in 1832; the Componium, an orchestra-in-a-box which composed random variations on a melody; and others along the same lines. I'll also talk about how we could go beyond these techniques to build something really creative. This is a presentation of material I found when I was doing research for the book I published in January, Machinamenta.
Attribute Discovery via Predictable and Discriminative Binary Codes[edit]
Speaker: Mohammad Rastegari -- Date: September 27, 2012
We present images with binary codes in a way that balances discrimination and learnability of the codes. In our method, each image claims its own code in a way that maintains discrimination while being predictable from visual data. Category memberships are usually good proxies for visual similarity but should not be enforced as a hard constraint. Our method learns codes that maximize separability of categories unless there is strong visual evidence against it. Simple linear SVMs can achieve state-of-the-art results with our short codes. In fact, our method produces state-of-the-art results on Caltech256 with only 128- dimensional bit vectors and outperforms state of the art by using longer codes. We also evaluate our method on ImageNet and show that our method outperforms state-of-the-art binary code methods on this large scale dataset. Lastly, our codes can discover a discriminative set of attributes.
Anomaly Detection on Railway Components using Sparse Representations[edit]
Speaker: Xavier Gibert-Serra -- Date: October 4, 2012
High-speed rail (HSR) requires high levels of reliability of the track infrastructure. Automated visual inspection is useful for finding many anomalies such as cracks or chips on joint bars and concrete ties, but existing vision-based inspection systems often produce high number of false detections, and are very sensitive to external factors such as changes in environmental conditions. For example, state-of-the-art algorithms used by the railroad industry nominally perform at a detection rate of 85% with a false alarm rate of 3% and performance drops very quickly as image quality degrades. On the tie inspection problem, this false alarm rate would correspond to 2.6 detections per second at 125 MPH, which cannot be handled by an operator. These false detections have many causes, including variations in anomaly appearance, texture, partial occlusion, and noise, which existing algorithms cannot handle very well. To overcome these limitations, it is necessary to reformulate this joint detection and segmentation problem as a Blind Source Separation problem, and use a generative model that is robust to noise and is capable of handling missing data.
In signal and image processing, Sparse Representations (SR) is an efficient way of describing a signal as a linear combination of a small number of atoms (elementary signals) from a dictionary. In natural images, sparsity arises from the statistical dependencies of pixel values across the image. Therefore, statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA) have been used for dimensionality reduction in several computer vision problems. Recent advances in SR theory have enabled methods that learn optimal dictionaries directly from training data. For example, K-SVD is a very well known algorithm for automatically designing over-complete dictionaries for sparse representation.
In this detection problem, the anomalies have very well defined structure and therefore, they can be represented sparsely in some subspace. In addition, the image background has very structured texture, so it is sparse with respect to a different frame. Theoretical results in mathematical geometric separation show that it is possible to separate these two image components (regular texture from contours) by minimizing the L1 norm the coefficients in geometrically complementary frames. More recently, it has been shown that this problem can be solved efficiently using thresholding and total variation regularization. Our experiments show that the sparse coefficients extracted from the contour component can be converted into feature vectors that can be used to cluster and detect these anomalies.
Using Google Street View to Identify Street-level Accessibility Problems[edit]
Speaker: Kotaro Hara -- Date: October 11, 2012
Poorly maintained sidewalks, missing curb ramps, and other obstacles pose considerable accessibility challenges; however, there are currently few, if any, mechanisms to determine accessible areas of a city a priori. In the first half of the presentation, I will talk about our investigation of the feasibility of using untrained crowd workers from Amazon Mechanical Turk (turkers) to find, label, and assess sidewalk accessibility problems in Google Street View imagery. Our work effectively demonstrates a promising new, highly scalable method for acquiring knowledge about sidewalk accessibility. In the latter half, I will discuss the future works as well as open research questions related in the field of computer vision.
Dictionary learning methods for computer vision[edit]
Speaker: Ashish Shrivastava -- Date: October 18, 2012
Sparse and redundant signal representations have recently gained much interest in image understanding. This is partly due to the fact that signals or images of interest are often sparse in some dictionary. These dictionaries can be either analytic or they can be learned directly from the data. In fact, it has been observed that learning a dictionary directly from data often leads to improved results in many practical applications such as classification and restoration. In this talk I will give a general overview of dictionary learning methods and talk in detail about my recent work on semi-supervised dictionary learning and non-linear supervised dictionary learning methods.
Dictionary-based Face Recognition from Video[edit]
Speaker: Yi-Chen Chen -- Date: October 25, 2012
The main challenge in recognizing faces in video is effectively exploiting the multiple frames of a face and the accompanying dynamic signature. One prominent method is based on extracting joint appearance and behavioral features. A second method models a person by temporal correlations of features in a video. Our approach introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images. Video-dictionaries are designed to implicitly encode temporal, pose, and illumination information. We demonstrate our method on the Face and Ocular Challenge Series (FOCS), which consists of unconstrained video sequences. We show that our method is efficient and performs significantly better than many competitive video-based face recognition algorithms.
Instance Level Multiple Instance Learning Using Similarity Preserving Quasi Cliques[edit]
Speaker: Mohammad Rastegari -- Date: November 1, 2012
In this work we introduce an instance-level approach to multiple instance learning. Our bottom-up approach learns a discriminative notion of similarity between instances in positive bags and use it to form a discriminative similarity graph. We then introduce the notion of similarity preserving quasi-cliques that aims at discovering large quasi-cliques with high scores of within-clique similarities. We argue that such large cliques provide clue to infer the underlying structure between positive instances. We use a ranking function that takes into account pairwise similarities coupled with prospectiveness of edges to score all positive instances. We show that these scores yield to positive instance discovery. Our experimental evaluations show that our method outperforms state-of-the-art MIL methods both at the bag-level and instance-level predictions in standard benchmarks and image and text datasets.
Knowledge Adaptation in Visual Domains[edit]
Speaker: Fatemeh Mirrashed -- Date: November 29, 2012
The new machine learning techniques of transfer learning and domain adaptation have recently captured special attention in the computer vision community. In this talk we will take a look at some of the methods that have been recently adopted or developed for adaptation of learning in the visual domains. We will also try to have an open discussion over some of more ideological questions such as better generalization versus adaptation. With abundance of massive volumes of visual training data should we keep at designing algorithms that could model all the possible variations in the visual world or should we regard adaptation as an integral part of learning in the visual domains?
TBA[edit]
Speaker: Ejaz Ahmed -- Date: December 6, 2012
Clustering Images with Algorithms and Humans[edit]
Speaker: Arijit Biswas -- Date: December 13, 2012
First, we propose a method of clustering images that combines algorithmic and human input. An algorithm provides us with pairwise image similarities. We then actively obtain selected, more accurate pairwise similarities from humans. A novel method is developed to choose the most useful pairs to show a person, obtaining constraints that improve clustering. In a clustering assignment elements in each data pair are either in the same cluster or in different clusters. We simulate inverting these pairwise relations and see how that affects the overall clustering. We choose a pair that maximizes the expected change in the clustering. The proposed algorithm has high time complexity, so we also propose a version of this algorithm that is much faster and exactly replicates our original algorithm. We further improve run-time by adding heuristics, and show that these do not significantly impact the effectiveness of our method. We have run experiments in two different domains, namely leaf images and face images, and show that clustering performance can be improved significantly.
Second, we define a new clustering problem called subclustering and propose passive and active subclustering algorithms. Although there are many excellent clustering algorithms, effective clustering remains very challenging for large datasets that contain many classes. Image clustering presents further problems because automatically computed image distances are often noisy. We address these challenges in two ways. First, we propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters. Then we make use of human input in an active subclustering algorithm to further improve results. We run experiments on a face image dataset (having 51,418 images from 200 classes) and a leaf image dataset and show that our proposed algorithms perform better than baseline methods.
Past Semesters[edit]
Current Seminar Series Coordinators[edit]
Emails are at umiacs.umd.edu.
Angjoo Kanazawa, kanazawa@ | (student of Professor David Jacobs) |
Sameh Khamis, sameh@ | (student of Professor Larry Davis) |
Jie Ni, jni@ | (student of Professor Rama Chellappa) |
Ching Lik Teo, cteo@ | (student of Professor Yiannis Aloimonos) |
Gone but not forgotten.
Anne Jorstad, jorstad@ | (student of Professor David Jacobs) |
Sima Taheri, taheri@ | (student of Professor Rama Chellappa) |