Difference between revisions of "Main Page"

From cvss
 
(93 intermediate revisions by 4 users not shown)
Line 20: Line 20:
 
* Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
 
* Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
  
==Schedule Spring 2014==
+
==Schedule Fall 2015==
  
 
All talks take place on Thursdays at 3:30pm in AVW 3450.
 
All talks take place on Thursdays at 3:30pm in AVW 3450.
Line 30: Line 30:
 
! Title
 
! Title
 
|-
 
|-
| January 30
+
| December 3
| Arpit Jain
+
| Angjoo Kanazawa
| Scene and Video Understanding
+
| Learning 3D Deformation of Animals from 2D Images
 
|-
 
|-
| February 6
+
| December 10
| Raviteja Vemulapalli
+
| Xintong Han
| Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group
+
| Automated Event Retrieval using Web Trained Detectors
|-
 
| February 13
 
| ''Google DC PhD Summit, no meeting''
 
|
 
|-
 
| February 20
 
| Varun Nagaraja
 
| Feedback Loop between High Level Semantics and Low Level Vision
 
|-
 
| February 27
 
| Mohammad Rastegari
 
| Predictable Dual View Hashing and Domain Adaptive Classification
 
|-
 
| March 6
 
| ''ECCV deadline, no meeting''
 
|
 
|-
 
| March 13
 
| Xavier Gibert Serra
 
| Anomaly Detection on Outdoor Images Using Sparse Representations
 
|-
 
| March 20
 
| ''Spring break, no meeting''
 
|
 
|-
 
| March 27
 
| Swaminathan Sankaranarayanan
 
| Estimating 3D Face Models
 
|-
 
| April 3
 
| Austin Myers
 
| Affordance of Object Parts from Geometric Features
 
|-
 
| April 10
 
| Jingjing Zheng
 
| Tag Taxonomy Aware Dictionary Learning for Region Tagging
 
|-
 
| April 17
 
| ''MACV at Virginia Tech, no meeting''
 
|
 
|-
 
| April 24
 
| Ejaz Ahmed
 
| Semantic Object Selection
 
|-
 
| May 1
 
| Sameh Khamis
 
| TBA
 
|-
 
| May 8
 
| Garrett Warnell
 
| TBA
 
|-
 
| May 15
 
| Sumit Sekhar
 
| TBA
 
 
|}
 
|}
  
==Talk Abstracts Spring 2014==
+
==Talk Abstracts Spring 2015==
  
===Scene and Video Understanding===
 
Speaker: [http://www.umiacs.umd.edu/~ajain/ Arpit Jain] -- Date: January 30, 2014
 
  
There has been significant improvements in the accuracy of scene understanding due to a shift from recognizing objects ``in isolation'' to context based recognition systems. Such systems improve recognition rates by augmenting appearance based models of individual objects with contextual information based on pairwise relationships between objects. These pairwise relations incorporate common world knowledge such as co-occurences and spatial arrangements of objects, scene layout, etc. However, these relations, even though consistent in 3D world, change due to viewpoint of the scene. In this thesis, we will look into the problems of incorporating contextual information from two different perspective for scene understanding problem (a)  ``what'' contextual relations are useful and ``how'' they should be incorporated into Markov network during inference. (b) jointly solve the segmentation and recognition problem using a multiple segmentation framework based on contextual information in conjunction with appearance matching. In the later part of the thesis, we will investigate different representations for video understanding and propose a discriminative patch based representation for videos.
+
===Learning 3D Deformation of Animals from 2D Images===
 +
Speaker: [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa] -- Date: December 3, 2015
  
Our work depart from traditional view of incorporating context into scene understanding problem where a fixed model for context is learned. We argue that context is scene dependent and propose a data-driven approach to predict the importance of edges and construct a Markov network for image analysis based on statistical models of global and local image features. Since all contextual information are not equally important, we also address the coupled problem of predicting the feature weights associated with each edge of a Markov network for evaluation of context. We then address the problem of fixed segmentation while modelling context by using a multiple segmentation framework and formulating the problem as ``a jigsaw puzzle''. We formulate the problem as segment selection from a pool of segments (jigsaws), assigning each selected segment a class label. Previous multiple segmentation approaches used local appearance matching to select segments in a greedy manner. In contrast, our approach formulates a cost function based on contextual information in conjunction with appearance matching. This relaxed cost function formulation is minimized using an efficient quadratic programming solver and an approximate solution is obtained by discretizing the relaxed solution.
+
Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.
  
Lastly, we propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification.
+
Link: [http://arxiv.org/pdf/1507.07646v1.pdf paper]
  
===Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group===
+
===Automated Event Retrieval using Web Trained Detectors===
Speaker: [http://ravitejav.weebly.com/ Raviteja Vemulapalli] -- Date: February 6, 2014
 
  
Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have resulted in a renewed interest in skeleton-based human action recognition. Most of the earlier skeleton-based approaches used either the joint locations or the joint angles to represent a human skeleton. In this paper, we propose a new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space. Since 3D rigid body motions are members of the special Euclidean group SE(3), the proposed skeletal representation lies in the Lie group SE(3)×. . .×SE(3), which is a curved manifold. With the proposed representation human actions can be modeled as curves in this Lie group. Since classification of curves in this Lie group is not an easy task, we map the action curves from the Lie group to its Lie algebra, which is a vector space. We then perform classification using a combination of dynamic time warping, Fourier temporal pyramid representation and linear SVM. Experimental results on three action datasets show that the proposed representation performs better than various other commonly-used skeletal representations. The proposed approach also outperforms various state-of-the-art skeleton-based human action recognition approaches.
+
Speaker: [http://www.umiacs.umd.edu/~xintong/ Xintong Han] -- Date: December 10, 2015
  
===Feedback Loop between High Level Semantics and Low Level Vision===
+
Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.
Speaker: Varun Nagaraja -- Date: February 20, 2014
 
 
 
High level semantical analysis typically involves constructing a Markov network over detections from low level detectors to encode context and model relationships between them. In complex higher order networks (e.g. Markov Logic Networks), each detection can be part of many factors and the network size grows rapidly as a function of the number of detections. Hence to keep the network size small, a threshold is applied on the confidence measures of the detections to discard the less likely detections. A practical challenge is to decide what thresholds to use to discard noisy detections. A high threshold will lead to a high false dismissal rate. A low threshold can result in many detections including mostly noisy ones which leads to a large network size and increased computational requirements.
 
 
 
We propose a feedback based incremental technique to tackle this problem, where we initialize the network with high confidence detections and then based on the high level semantics in the initial network, we can incrementally select the relevant missing low level detections. We show three different ways of selecting detections which are based on three scoring functions that bound the increase in the optimal value of the objective function of network, with varying degrees of accuracy and computational cost. We perform experiments with an event recognition task in one-on-one basketball videos that uses Markov Logic Networks.
 
 
 
===Predictable Dual View Hashing and Domain Adaptive Classification===
 
Speaker: [http://www.umiacs.umd.edu/~mrastega/ Mohammad Rastegari] -- Date: February 27, 2014
 
 
 
We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces. We create a cross-view hamming space with the ability to compare information from previously incomparable domains with a notion of 'predictability'. By performing comparative experimental analysis on two large datasets, PASCAL-Sentence and SUN-Attribute, we demonstrate the superiority of our method to the state-of-the-art dual-view binary code learning algorithms. We also propose an unsupervised domain adaptation method that exploits intrinsic compact structures of categories across different domains using binary attributes. Our method directly optimizes for classification in the target domain. The key insight is finding attributes that are discriminative across categories and predictable across domains.
 
 
 
===Anomaly Detection on Outdoor Images Using Sparse Representations===
 
Speaker: [http://www.umiacs.umd.edu/~gibert/ Xavier Gibert Serra] -- Date: March 13, 2014
 
 
 
The integrity of safety-critical infrastructure, such as railway tracks, roads, or bridges needs to be monitored regularly to prevent catastrophic failures. For example, federal regulations require visual inspection of all high speed tracks twice each week. Traditional manual inspection methods are time-consuming and prone to human error. With the availability of high-speed cameras, it is possible to survey large areas in less time. However, detecting cracks and other anomalies on these images is a particularly challenging problem because of the uncontrolled environment arising from differences in material composition, and superficial degradation caused by outdoor elements. Due to speed requirements, images acquired from a moving vehicle have limited resolution, causing the smallest of these cracks to be under-sampled in the transversal dimension. Therefore, these cracks get mixed with background texture, resulting in negative signal-to-noise ratio. State-of-the art methods are based on linear filters, which are only optimal under additive Gaussian noise assumptions. This problem of simultaneous detection and clustering of anomalies in textured images can be posed as a blind source separation problem, and by exploiting the mutual incoherence of the dictionaries of shearlets and isotropic wavelets, which sparsely represent cracks and texture, we can separate each component using an iterative shrinkage algorithm. In this talk, I will present an integrated framework for image separation, feature extraction, clustering and classification that takes advantage of this decomposition.
 
 
 
===Estimating 3D Face Models===
 
Speaker: Swaminathan Sankaranarayanan -- Date: March 27, 2014
 
 
 
In this talk, I will focus on the topic of 3D Face Model Estimation from Single Grayscale Images. This problem is usually formulated as a Shape from Shading problem involving assumptions about the Image Formation and the Illumination framework. I will review some of the state-of-art methods that attempt to solve this problem by using knowledge from existing 3D shape models of face images. I will the introduce the idea of using Sparse Depth Representations  and motivate my method of formulating the Model Estimation problem as a Bilevel Sparse Coding Optimization. I will conclude my talk by explaining the algorithm that is used to solve the objective function and the issues that I am facing with it.
 
 
 
===Affordance of Object Parts from Geometric Features===
 
Speaker: [https://sites.google.com/site/austinomyers/ Austin Myers] -- Date: April 3, 2014
 
 
 
Understanding affordance is a first step to a deeper understanding of the world, one in which a robot knows how an object and its parts can be used. To assist in everyday activities, robots must not only be able to recognize a tool, but also localize the its parts and identify how each part is used. We propose a preliminary approach to jointly localize and identify the function, or affordances, of a tool’s parts for objects from known or completely novel categories. We combine superpixel segmentation, feature learning, and conditional random fields to provide precise 3D predictions of functional parts that can be used directly by a robot to interact with the world. To investigate this problem, we introduce a new RGB-D Part Affordance Dataset consisting of 105 kitchen, workshop, and garden tools with pixel-level affordance labels for over 10,000 RGB-D images. We analyze the effectiveness of different feature types, and show that geometric features are most important for successful affordance identification. We demonstrate that by identifying the affordances of tools at the level of parts, we can generalize to novel object categories and identify the useful parts of never before seen tools.
 
 
 
===Tag Taxonomy Aware Dictionary Learning for Region Tagging===
 
Speaker: [https://sites.google.com/site/jingjingzhengumd/ Jingjing Zheng] -- Date: April 10, 2014
 
 
 
Tags of image regions are often arranged in a hierarchical taxonomy based on their semantic meanings. Using the given tag taxonomy, we propose to jointly learn multi-layer hierarchical dictionaries and corresponding linear classifiers for region tagging. Specifically, we generate a node-specific dictionary for each tag node in the taxonomy, and then concatenate the node-specific dictionaries from each level to construct a level-specific dictionary. The hierarchical semantic structure among tags is preserved in the relationship among node-dictionaries. Simultaneously, the sparse codes obtained using the level-specific dictionaries are summed up as the final feature representation to design a linear classifier. Our approach not only makes use of sparse codes obtained from higher levels to help learn the classifiers for lower levels, but also encourages the tag nodes from lower levels that have the same parent tag node to implicitly share sparse codes obtained from higher levels. Experimental results using three benchmark datasets show that the proposed approach yields the best performance over recently proposed methods.
 
 
 
===Semantic Object Selection===
 
Speaker: [http://www.umiacs.umd.edu/~ejaz/ Ejaz Ahmed] -- Date: April 24, 2014
 
 
 
Interactive object segmentation has great practical importance in computer vision. Many interactive methods have been proposed utilizing user input in the form of mouse clicks and mouse strokes, and often requiring a lot of user intervention. In this paper, we present a system with a far simpler input method: the user needs only give the name of the desired object.  With the tag provided by the user we do a text query of an image database to gather exemplars of the object. Using object proposals and borrowing ideas from image retrieval and object detection, the object is localized in the target image.  An appearance model generated from the exemplars and the location prior are used in an energy minimization framework to select the object. Our method outperforms the state-of-the-art on existing datasets and on a more challenging dataset we collected.
 
  
 +
Link: [http://arxiv.org/pdf/1509.07845v1.pdf paper]
  
 
==Past Semesters==
 
==Past Semesters==
 +
* [[Cvss:Spring2015| Spring 2015]]
 +
* [[cvss fall2014|Fall 2014]]
 +
* [[cvss_spring2014|Spring 2014]]
 
* [[cvss_fall2013|Fall 2013]]
 
* [[cvss_fall2013|Fall 2013]]
 
* [[cvss_summer2013|Summer 2013]]
 
* [[cvss_summer2013|Summer 2013]]
Line 160: Line 71:
 
==Funded By==
 
==Funded By==
 
* Computer Vision Faculty
 
* Computer Vision Faculty
* '''[http://www.northropgrumman.com/ Northrop Grumman]'''
+
<!-- * '''[http://www.northropgrumman.com/ Northrop Grumman]''' -->
  
 
==Current Seminar Series Coordinators==
 
==Current Seminar Series Coordinators==
Line 167: Line 78:
  
 
{| cellpadding="1"
 
{| cellpadding="1"
 +
|-
 +
| [http://sites.google.com/site/austinomyers/ Austin Myers], amyers@
 +
| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 
|-
 
|-
 
| [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa], kanazawa@
 
| [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa], kanazawa@
| (student of [http://www.cs.umd.edu/~djacobs/ Professor David Jacobs])
+
| (student of [http://cs.umd.edu/~djacobs/ Professor David Jacobs])
 +
|-
 +
| [http://sites.google.com/site/yechengxi/ Chenxi Ye] cxy@
 +
| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 
|-
 
|-
| [http://www.umiacs.umd.edu/~sameh/ Sameh Khamis], sameh@
+
| [http://www.umiacs.umd.edu/~xintong/ Xintong Han], xintong@
 
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 
|-
 
|-
| [https://sites.google.com/site/austinomyers/ Austin Myers], amyers@
+
| [http://www.cs.umd.edu/~bharat/ Bharat Singh], bharat@
| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
+
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 
|-
 
|-
| [http://ravitejav.weebly.com/ Raviteja Vemulapalli], raviteja @
+
| [http://bcsiriuschen.github.io/ Bor-Chun (Sirius) Chen], sirius@
| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
+
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 
|}
 
|}
  
 
Gone but not forgotten.
 
Gone but not forgotten.
 
 
{| cellpadding="1"
 
{| cellpadding="1"
 +
|-
 +
| [http://www.umiacs.umd.edu/~jhchoi/ Jonghyun Choi], jhchoi@
 +
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 +
|-
 +
| Ching-Hui Chen, ching@
 +
| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
 +
|
 +
|-
 +
| [http://ravitejav.weebly.com/ Raviteja Vemulapalli], raviteja @
 +
| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
 +
|-
 +
| [http://www.umiacs.umd.edu/~sameh/ Sameh Khamis]
 +
|
 
|-
 
|-
 
| [http://www.umiacs.umd.edu/~ejaz/ Ejaz Ahmed]
 
| [http://www.umiacs.umd.edu/~ejaz/ Ejaz Ahmed]
Line 192: Line 121:
 
|-
 
|-
 
| [http://www.umiacs.umd.edu/~jni/ Jie Ni]
 
| [http://www.umiacs.umd.edu/~jni/ Jie Ni]
| off this semester
+
| now at Sony
 
|-
 
|-
 
| [http://www.umiacs.umd.edu/~taheri/ Sima Taheri]
 
| [http://www.umiacs.umd.edu/~taheri/ Sima Taheri]

Latest revision as of 23:40, 3 December 2015

Computer Vision Student Seminars

The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.

To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.

Description[edit]

The purpose of these talks is to:

  • Encourage interaction between computer vision students;
  • Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
  • Provide an opportunity for computer vision students to receive feedback on their current research;
  • Provide speaking opportunities for computer vision students.

The guidelines for the format are:

  • An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
  • The talks are meant to be casual and discussion is encouraged.
  • Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.

Schedule Fall 2015[edit]

All talks take place on Thursdays at 3:30pm in AVW 3450.

Date Speaker Title
December 3 Angjoo Kanazawa Learning 3D Deformation of Animals from 2D Images
December 10 Xintong Han Automated Event Retrieval using Web Trained Detectors

Talk Abstracts Spring 2015[edit]

Learning 3D Deformation of Animals from 2D Images[edit]

Speaker: Angjoo Kanazawa -- Date: December 3, 2015

Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.

Link: paper

Automated Event Retrieval using Web Trained Detectors[edit]

Speaker: Xintong Han -- Date: December 10, 2015

Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.

Link: paper

Past Semesters[edit]

Funded By[edit]

  • Computer Vision Faculty

Current Seminar Series Coordinators[edit]

Emails are at umiacs.umd.edu.

Austin Myers, amyers@ (student of Professor Yiannis Aloimonos)
Angjoo Kanazawa, kanazawa@ (student of Professor David Jacobs)
Chenxi Ye cxy@ (student of Professor Yiannis Aloimonos)
Xintong Han, xintong@ (student of Professor Larry Davis)
Bharat Singh, bharat@ (student of Professor Larry Davis)
Bor-Chun (Sirius) Chen, sirius@ (student of Professor Larry Davis)

Gone but not forgotten.

Jonghyun Choi, jhchoi@ (student of Professor Larry Davis)
Ching-Hui Chen, ching@ (student of Professor Rama Chellappa)
Raviteja Vemulapalli, raviteja @ (student of Professor Rama Chellappa)
Sameh Khamis
Ejaz Ahmed
Anne Jorstad now at EPFL
Jie Ni now at Sony
Sima Taheri
Ching Lik Teo