Difference between revisions of "Main Page"

From cvss
 
(355 intermediate revisions by 9 users not shown)
Line 1: Line 1:
<Big>'''Computer Vision Student Seminar'''</Big>
+
<Big>'''Computer Vision Student Seminars'''</Big>
  
The Computer Vision Student Seminar at the University of Maryland College Park is a student-run series of talks given by current graduate students for [http://www.cfar.umd.edu/cvl/meetthe.html#Graduate current graduate students].
+
The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by [http://www.cfar.umd.edu/cvl/meetthe.html#Graduate current graduate students] for [http://www.cfar.umd.edu/cvl/meetthe.html#Graduate current graduate students].
  
 +
To receive regular information about the Computer Vision Student Seminars, subscribe to our [https://mailman.cs.umd.edu/mailman/listinfo/cvss mailing list] or our [http://talks.cs.umd.edu/lists/12 talks list].
  
 
==Description==
 
==Description==
Line 13: Line 14:
 
* Provide speaking opportunities for computer vision students.
 
* Provide speaking opportunities for computer vision students.
  
 
+
The guidelines for the format are:
==Format==
 
  
 
* An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.   
 
* An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.   
Line 20: Line 20:
 
* Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
 
* Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
  
 +
==Schedule Fall 2015==
  
==Subscribe to the Mailing List==
+
All talks take place on Thursdays at 3:30pm in AVW 3450.
 
 
To receive regular information about the Computer Vision Student Seminar, subscribe to the mailing list by following the instructions [https://mailman.cs.umd.edu/mailman/listinfo/cvss here].
 
 
 
  
==Schedule Summer 2011==
+
{| class="wikitable" cellpadding="10" border="1" cellspacing="1"
 
 
All talks take place Thursdays at 4pm in AVW 3450.
 
 
 
{| class="wikitable" cellpadding="10" border="1" cellspacing="0"
 
 
|-
 
|-
 
! Date
 
! Date
Line 36: Line 30:
 
! Title
 
! Title
 
|-
 
|-
| June 9
+
| December 3
| Vlad Morariu
+
| Angjoo Kanazawa
| Multi-Agent Event Recognition in Structured Scenarios
+
| Learning 3D Deformation of Animals from 2D Images
|-
 
| June 16
 
| Ajay Mishra
 
| A Vision System to Extract "Simple" Objects in a Purely Bottom-Up Fashion
 
|-
 
| June 23
 
| (no meeting, CVPR)
 
|
 
|-
 
| June 30
 
| Dikpal Reddy
 
| Fast Imaging with Slow Cameras
 
|-
 
| July 7
 
| Raghuraman Gopalan
 
| Exploring Context in Unsupervised Object Identification Scenarios
 
|-
 
| July 14
 
| Behjat Siddiquie
 
| Utilizing Contextual Information for Scene Understanding and Image Retrieval
 
|-
 
| July 21
 
| Kaushik Mitra
 
| Robust Regression Using Sparse Learning
 
|-
 
| July 28
 
| Zhuolin Jiang
 
| Discriminative Dictionary Learning for Sparse Representation
 
|-
 
| August 4
 
| Carlos Castillo
 
| Dense Wide-Baseline Stereo Matching and its Application to Face Recognition
 
|-
 
| August 11
 
| Qiang Qiu
 
| Learning an Attribute Dictionary for Human Action Classification
 
|-
 
| August 18
 
| Yezhou Yang
 
| Corpus-Guided Sentence Generation of Natural Images
 
|-
 
| August 25
 
| Nazre Batool
 
|
 
 
|-
 
|-
| September 01
+
| December 10
| ----
+
| Xintong Han
|
+
| Automated Event Retrieval using Web Trained Detectors
|-
 
| September 08
 
| Vishal Patel
 
|
 
 
|}
 
|}
  
 +
==Talk Abstracts Spring 2015==
  
==Talk Abstracts==
 
 
====Multi-Agent Event Recognition in Structured Scenarios====
 
Speaker: [http://www.umiacs.umd.edu/~morariu/ Vlad Morariu] -- Date: June 9, 2011
 
 
I will present a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities.  Given semantic spatio-temporal descriptions of what generally happens (i.e., rules, event descriptions, physical constraints), and based on video analysis, the framework determines the events that occurred.  Knowledge about spatio-temporal structure is encoded using first-order logic using an approach based on Allen's Interval Logic, and robustness to low-level observation uncertainty is provided by Markov Logic Networks (MLN).  The main contribution is that the framework integrates interval-based temporal reasoning with probabilistic logical inference, relying on an efficient bottom-up grounding scheme to avoid combinatorial explosion. Applied to one-on-one basketball, the framework detects and tracks players, their hands and feet, and the ball, generates event observations from the resulting trajectories, and performs probabilistic logical inference to determine the most consistent sequence of events.
 
 
===A Vision System to Extract "Simple" Objects in a Purely Bottom-Up Fashion===
 
Speaker: [http://www.umiacs.umd.edu/~mishraka/ Ajay Mishra] -- Date: June 16, 2011
 
  
Human perception, being active, is inextricably linked to visual fixation. Despite the obvious importance of fixation, it has not become an integral part of computer vision/robotics algorithms so far. To incorporate fixation and attention in a computer vision framework, we have proposed a new segmentation framework that takes a fixation point (i.e a single point) inside a "simple" object as its input and outputs the region corresponding to that object. We have also designed a new attentional mechanism that utilizes the concept of neural border-ownership to automatically select the fixation points inside different "simple" objects in the scene. All of this together creates a fully automatic system that outputs only the regions corresponding to the "simple" objects without knowing the actual number or the size of the objects in the scene.
+
===Learning 3D Deformation of Animals from 2D Images===
 +
Speaker: [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa] -- Date: December 3, 2015
  
Using these regions, instead of rectangular patches of fixed sizes, to analyze the content of a scene will result in better performance (in terms of accuracy and robustness to noise) for high-level vision algorithms such as object recognition, object manipulation, and action analysis. A variety of experimental results will conclude the talk.
+
Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.
  
Also, to understand the role of fixation in perception, Ajay recommends taking the psychophysical test available at http://www.umiacs.umd.edu/~mishraka/fixationExperiment.php
+
Link: [http://arxiv.org/pdf/1507.07646v1.pdf paper]
  
===Fast Imaging with Slow Cameras===
+
===Automated Event Retrieval using Web Trained Detectors===
Speaker: [http://www.umiacs.umd.edu/~dikpal/ Dikpal Reddy] -- Date: June 30, 2011
 
  
Over the years, the spatial resolution of cameras has steadily increased but the temporal resolution has remained the same. In this talk, I will present my work on converting a regular slow camera into a faster one. We capture and accurately reconstruct fast events using our slower prototype camera by exploiting the temporal redundancy in videos. First, I will show how by fluttering the shutter during the exposure duration of a slow 25fps camera we can capture and reconstruct a fast periodic video at 2000fps. Next, I will present its generalization where we show that per-pixel modulation during exposure, in combination with brightness constancy constraints allows us to capture a broad class of motions at 200fps using a 25fps camera. In both these techniques we borrow ideas from compressive sensing theory for acquisition and recovery.
+
Speaker: [http://www.umiacs.umd.edu/~xintong/ Xintong Han] -- Date: December 10, 2015
  
===Exploring Context in Unsupervised Object Identification Scenarios===
+
Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.
Speaker: [http://www.umiacs.umd.edu/~raghuram/ Raghuraman Gopalan] -- Date: July 7, 2011
 
  
The utility of context for supervised object recognition has been well acknowledged from the early seventies, and has been practically demonstrated by many systems in the last few years. The goal of this talk is to understand the role of context in unsupervised pattern identification scenarios. We consider two problems of clustering a set of unlabelled data points using maximum margin principles, and adapting a classifier trained on a specific domain to identify instances across novel domain shifting transformations, and propose contextual sources that provide pertinent information on the identity of the unlabelled data.
+
Link: [http://arxiv.org/pdf/1509.07845v1.pdf paper]
  
===Utilizing Contextual Information for Scene Understanding and Image Retrieval===
+
==Past Semesters==
Speaker: [http://www.cs.umd.edu/~behjat/ Behjat Siddiquie] -- Date: July 14, 2011
+
* [[Cvss:Spring2015| Spring 2015]]
 
+
* [[cvss fall2014|Fall 2014]]
In many vision tasks, contextual information can often help disambiguate confusions arising from appearance information. In this talk, I will discuss two different works, which deal with effective utilization of contextual information to improve the performance of active learning for scene understanding and multi-attribute based image retrieval.
+
* [[cvss_spring2014|Spring 2014]]
 
+
* [[cvss_fall2013|Fall 2013]]
First, I will propose an active learning framework to simultaneously learn appearance and contextual models for scene understanding tasks (multi-class classification). Current multi-class active learning approaches ignore the contextual interactions between different regions of an image and the fact that knowing the label for one region provides information about the labels of other regions. We explicitly model the contextual interactions between regions and select the question which leads to the maximum reduction in the combined entropy of all the regions in the image (image entropy).
+
* [[cvss_summer2013|Summer 2013]]
 
+
* [[cvss_spring2013|Spring 2013]]
Next, I will present a novel approach for ranking and retrieval of images based on multi-attribute queries. Existing image retrieval methods train separate classifiers for each word and heuristically combine their outputs for retrieving multi-word queries. Moreover, these approaches ignore the interdependencies among the query words. In contrast, we propose a principled approach for multi-attribute retrieval which explicitly models the correlations that are present between the attributes. Given a multi-attribute query, we also utilize other attributes in the vocabulary which are not present in the query, for ranking/retrieval.
+
* [[cvss_fall2012|Fall 2012]]
 
+
* [[cvss_spring2012|Spring 2012]]
===Robust Regression Using Sparse Learning===
+
* [[cvss_fall2011|Fall 2011]]
Speaker: [http://www.umiacs.umd.edu/~kmitra/ Kaushik Mitra] -- Date: July 21, 2011
+
* [[cvss_summer2011|Summer 2011]]
 
 
Robust regression is a combinatorial optimization problem. Hence, algorithms such as RANSAC and least median squares (LMedS), which are successful in solving low-dimensional problems, can not be used for solving high-dimensional problems. We show that under certain conditions the robust linear regression problem can be solved accurately using polynomial-time algorithms such as a modified version of basis pursuit and a sparse Bayesian algorithm. We then extend our robust formulation to the case of kernel regression, specifically to propose a robust version for relevance vector machine (RVM) regression.
 
 
 
===Discriminative Dictionary Learning for Sparse Representation===
 
Speaker: [http://www.umiacs.umd.edu/~zhuolin/ Zhuolin Jiang] -- Date: July 28, 2011
 
 
 
Sparse coding approximates an input signal by a sparse linear combination of items from an over-complete dictionary. The sparse coding-based approaches lead to state-of-the-art results for many signal or image processing tasks and advances in computer vision tasks such as object recognition. However, the performance of sparse coding relies on the quality of dictionary. How to design or learn the best dictionary adapted to natural signals has been the topic of much research in the past. In this talk I will first introduce some recent techniques that learn the dictionary from training data. Next I will present a label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse representation. It yields dictionaries so that feature points with the same class labels have similar sparse codes.
 
 
 
===Dense Wide-Baseline Stereo Matching and its Application to Face Recognition===
 
Speaker: [http://www.cs.umd.edu/~carlos/ Carlos Castillo] -- Date: August 4, 2011
 
 
 
We study the problem of dense wide baseline stereo with varying illumination. We are motivated by the problem of face recognition across pose. Stereo matching allows us to compare face images based on physically valid, dense correspondences. We show that the stereo matching cost provides a very robust measure of similarity of faces that is insensitive to pose variations. We build on the observation that most illumination insensitive local comparisons require the use of relatively large windows. The size of these windows is affected by foreshortening. If we do not account for this effect, we incur misalignments that are systematic and significant and are exacerbated by wide baseline conditions.
 
 
 
We present a general formulation of dense wide baseline stereo with varying illumination and provide two methods to solve them. The first method is based on dynamic programming (DP) and fully accounts for the effect of slant. The second method is based on graph cuts (GC) and fully accounts for the effect of slant and tilt. The GC method finds a global solution using the unary function from the general formulation and a novel smoothness term that encodes surface orientation.
 
 
 
Our experiments show that the DP dense wide baseline stereo demonstrates superior performance compared to existing methods in face recognition across pose. The experiments with the GC method show that accounting for slant and tilt can improve performance in situations with wide baselines and lighting variation. Our formulation can be applied to other more sophisticated window based image comparison methods for stereo.
 
 
 
===Learning an Attribute Dictionary for Human Action Classification===
 
Speaker: [http://www.cs.umd.edu/~qiu/ Qiang Qiu] -- Date: August 11, 2011
 
 
 
We present an approach for dictionary learning of action attributes via information maximization.  We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary item. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with a compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes.  More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition applications.
 
 
 
===Corpus-Guided Sentence Generation of Natural Images===
 
Speaker: [http://www.umiacs.umd.edu/~yzyang/ Yezhou Yang] -- Date: August 18, 2011
 
 
 
We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies
 
that use vision alone.
 
  
 +
==Funded By==
 +
* Computer Vision Faculty
 +
<!-- * '''[http://www.northropgrumman.com/ Northrop Grumman]''' -->
  
 
==Current Seminar Series Coordinators==
 
==Current Seminar Series Coordinators==
Line 164: Line 77:
 
Emails are at umiacs.umd.edu.
 
Emails are at umiacs.umd.edu.
  
{| class="wikitable" cellpadding="5"
+
{| cellpadding="1"
 
|-
 
|-
| Anne Jorstad, jorstad@
+
| [http://sites.google.com/site/austinomyers/ Austin Myers], amyers@
| (student of Professor David Jacobs)
+
| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 
|-
 
|-
| Sameh Khamis, sameh@
+
| [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa], kanazawa@
| (student of Professor Larry Davis)
+
| (student of [http://cs.umd.edu/~djacobs/ Professor David Jacobs])
 
|-
 
|-
| Sima Taheri, taheri@
+
| [http://sites.google.com/site/yechengxi/ Chenxi Ye] cxy@
| (student of Professor Rama Chellappa)
+
| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 
|-
 
|-
| Ching Lik Teo, cteo@
+
| [http://www.umiacs.umd.edu/~xintong/ Xintong Han], xintong@
| (student of Professor Yiannis Aloimonos)
+
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 +
|-
 +
| [http://www.cs.umd.edu/~bharat/ Bharat Singh], bharat@
 +
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 +
|-
 +
| [http://bcsiriuschen.github.io/ Bor-Chun (Sirius) Chen], sirius@
 +
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 
|}
 
|}
  
 
+
Gone but not forgotten.
== Wiki Editing ==
+
{| cellpadding="1"
 
+
|-
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
+
| [http://www.umiacs.umd.edu/~jhchoi/ Jonghyun Choi], jhchoi@
 
+
| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
* [http://www.mediawiki.org/wiki/Help:Configuration_settings Configuration settings list]
+
|-
* [http://www.mediawiki.org/wiki/Help:FAQ MediaWiki FAQ]
+
| Ching-Hui Chen, ching@
* [http://mail.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
+
| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
 +
|
 +
|-
 +
| [http://ravitejav.weebly.com/ Raviteja Vemulapalli], raviteja @
 +
| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
 +
|-
 +
| [http://www.umiacs.umd.edu/~sameh/ Sameh Khamis]
 +
|
 +
|-
 +
| [http://www.umiacs.umd.edu/~ejaz/ Ejaz Ahmed]
 +
|
 +
|-
 +
| [http://cvlabwww.epfl.ch/~jorstad/ Anne Jorstad]
 +
| now at EPFL
 +
|-
 +
| [http://www.umiacs.umd.edu/~jni/ Jie Ni]
 +
| now at Sony
 +
|-
 +
| [http://www.umiacs.umd.edu/~taheri/ Sima Taheri]
 +
|
 +
|-
 +
| [http://www.umiacs.umd.edu/~cteo/ Ching Lik Teo]
 +
|
 +
|}

Latest revision as of 23:40, 3 December 2015

Computer Vision Student Seminars

The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.

To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.

Description[edit]

The purpose of these talks is to:

  • Encourage interaction between computer vision students;
  • Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
  • Provide an opportunity for computer vision students to receive feedback on their current research;
  • Provide speaking opportunities for computer vision students.

The guidelines for the format are:

  • An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
  • The talks are meant to be casual and discussion is encouraged.
  • Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.

Schedule Fall 2015[edit]

All talks take place on Thursdays at 3:30pm in AVW 3450.

Date Speaker Title
December 3 Angjoo Kanazawa Learning 3D Deformation of Animals from 2D Images
December 10 Xintong Han Automated Event Retrieval using Web Trained Detectors

Talk Abstracts Spring 2015[edit]

Learning 3D Deformation of Animals from 2D Images[edit]

Speaker: Angjoo Kanazawa -- Date: December 3, 2015

Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.

Link: paper

Automated Event Retrieval using Web Trained Detectors[edit]

Speaker: Xintong Han -- Date: December 10, 2015

Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.

Link: paper

Past Semesters[edit]

Funded By[edit]

  • Computer Vision Faculty

Current Seminar Series Coordinators[edit]

Emails are at umiacs.umd.edu.

Austin Myers, amyers@ (student of Professor Yiannis Aloimonos)
Angjoo Kanazawa, kanazawa@ (student of Professor David Jacobs)
Chenxi Ye cxy@ (student of Professor Yiannis Aloimonos)
Xintong Han, xintong@ (student of Professor Larry Davis)
Bharat Singh, bharat@ (student of Professor Larry Davis)
Bor-Chun (Sirius) Chen, sirius@ (student of Professor Larry Davis)

Gone but not forgotten.

Jonghyun Choi, jhchoi@ (student of Professor Larry Davis)
Ching-Hui Chen, ching@ (student of Professor Rama Chellappa)
Raviteja Vemulapalli, raviteja @ (student of Professor Rama Chellappa)
Sameh Khamis
Ejaz Ahmed
Anne Jorstad now at EPFL
Jie Ni now at Sony
Sima Taheri
Ching Lik Teo