Difference between revisions of "Main Page"

From cvss

Latest revision as of 23:40, 3 December 2015

Computer Vision Student Seminars

The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.

To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.

Description[edit]

The purpose of these talks is to:

Encourage interaction between computer vision students;
Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
Provide an opportunity for computer vision students to receive feedback on their current research;
Provide speaking opportunities for computer vision students.

The guidelines for the format are:

An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
The talks are meant to be casual and discussion is encouraged.
Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.

Schedule Fall 2015[edit]

All talks take place on Thursdays at 3:30pm in AVW 3450.

Date	Speaker	Title
December 3	Angjoo Kanazawa	Learning 3D Deformation of Animals from 2D Images
December 10	Xintong Han	Automated Event Retrieval using Web Trained Detectors

Talk Abstracts Spring 2015[edit]

Learning 3D Deformation of Animals from 2D Images[edit]

Speaker: Angjoo Kanazawa -- Date: December 3, 2015

Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.

Link: paper

Automated Event Retrieval using Web Trained Detectors[edit]

Speaker: Xintong Han -- Date: December 10, 2015

Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.

Link: paper

Past Semesters[edit]

Funded By[edit]

Computer Vision Faculty

Current Seminar Series Coordinators[edit]

Emails are at umiacs.umd.edu.

Austin Myers, amyers@	(student of Professor Yiannis Aloimonos)
Angjoo Kanazawa, kanazawa@	(student of Professor David Jacobs)
Chenxi Ye cxy@	(student of Professor Yiannis Aloimonos)
Xintong Han, xintong@	(student of Professor Larry Davis)
Bharat Singh, bharat@	(student of Professor Larry Davis)
Bor-Chun (Sirius) Chen, sirius@	(student of Professor Larry Davis)

Gone but not forgotten.

Jonghyun Choi, jhchoi@	(student of Professor Larry Davis)
Ching-Hui Chen, ching@	(student of Professor Rama Chellappa)
Raviteja Vemulapalli, raviteja @	(student of Professor Rama Chellappa)
Sameh Khamis
Ejaz Ahmed
Anne Jorstad	now at EPFL
Jie Ni	now at Sony
Sima Taheri
Ching Lik Teo

Web Accessibility

Retrieved from "https://wiki.cs.umd.edu/cvss/w/index.php?title=Main_Page&oldid=1868"

@@ Line 20: / Line 20: @@
 * Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
-==Schedule Fall 2013==
+==Schedule Fall 2015==
-All talks take place on Thursdays at 4:30pm in AVW 3450.
+All talks take place on Thursdays at 3:30pm in AVW 3450.
 {| class="wikitable" cellpadding="10" border="1" cellspacing="1"
@@ Line 30: / Line 30: @@
 ! Title
 |-
-| September 19
+| December 3
-| Mohammad Rastegari
+| Angjoo Kanazawa
-| Fast Image Prior
+| Learning 3D Deformation of Animals from 2D Images
 |-
-| September 26
+| December 10
-| ''(no meeting)''
+| Xintong Han
-|
+| Automated Event Retrieval using Web Trained Detectors
-|-
-| October 3
-| ''(MSR talk, no meeting)''
-|
-|-
-| October 10
-| Yezhou Yang
-| A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection
-|-
-| October 17
-| Garrett Warnell
-| Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera
-|-
-| October 24
-| Abhishek Sharma
-| A Sentence is Worth a Thousand Pixels
-|-
-| October 31
-| ''(CVPR deadline, no meeting)''
-|
-|-
-| November 7
-| Jingjing Zheng
-| Cross-View Action Recognition Via a Transferable Dictionary Pair
-|-
-| November 14
-| Sumit Shekhar
-| Joint Sparse Representation for Multimodal Biometric Recognition
-|-
-| November 21
-| Jonghyun Choi
-| Renaissance of Convolutional Neural Network - what, why and so?
-|-
-| November 28
-| ''(Thanksgiving, no meeting)''
-|
-|-
-| December 5
-| Arijit Biswas
-| Distance Learning Using the Triangle Inequality for Semi-supervised Clustering
 |}
-==Talk Abstracts Fall 2013==
+==Talk Abstracts Spring 2015==
-===Fast Image Prior===
-Speaker: [http://www.umiacs.umd.edu/~mrastega/ Mohammad Rastegari] -- Date: September 19, 2013
-In this project we introduce a new method for learning image prior that can be used for many applications in image reconstruction. We learn a generative model on natural image patches. Our generative model is similar to one in Gausian Mixture Model (GMM). The key idea of our approach is to force each component of our generative model to share the same set of basis vectors. This leads to a much faster inference at test time. We used image denoising as our test bed for this image prior learning. Our experimental results shows that we reached about 30x speed up over state-of-the-art method while getting slightly improvement in denoising accuracy.
-===A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection===
-Speaker: [http://www.umiacs.umd.edu/~yzyang/ Yezhou Yang] -- Date October 10, 2013
-Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars  in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used  bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute  manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism.
-===Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera===
-Speaker: [http://garrettwarnell.com/ Garrett Warnell] -- Date: October 17, 2013
-We extend the classical notion of visual saliency to multi-image data collected using a stationary pan-tilt-zoom (PTZ) camera. We show why existing saliency methods are not effective for this type of data, and propose ray saliency: a modified notion of visual saliency that utilizes knowledge of the imaging process in order to appropriately incorporate the context provided by multiple images. We present a practical, mosaic-free method by which to quantify and calculate ray saliency, and demonstrate its usefulness on PTZ imagery.
-===A Sentence is Worth a Thousand Pixels===
-Speaker: [http://www.umiacs.umd.edu/~bhokaal/ Abhishek Sharma] -- Date: October 24, 2013
-We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models.
-===Cross-View Action Recognition Via a Transferable Dictionary Pair===
+===Learning 3D Deformation of Animals from 2D Images===
-Speaker: [https://sites.google.com/site/jingjingzhengumd/ Jingjing Zheng] -- Date: November 7, 2013
+Speaker: [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa] -- Date: December 3, 2015
-Discriminative appearance features are effective for recognizing actions in a fixed view, but generalize poorly to changes in viewpoint. We present a method for view-invariant action recognition based on sparse representations using a transferable dictionary pair. A transferable dictionary pair consists of two dictionaries that correspond to the source and target views respectively. The two dictionaries are learned simultaneously from pairs of videos taken at different views and aim to encourage each video in the pair to have the same sparse representation. Thus, the transferable dictionary pair links features between the two views that are useful for action recognition. Both unsupervised and supervised algorithms are presented for learning transferable dictionary pairs. Using the sparse representation as features, a classifier built in the source view can be directly transferred to the target view. We extend our approach to transferring an action model learned from multiple source views to one target view. We demonstrate the effectiveness of our approach on the multi-view IXMAS data set. Our results compare favorably to the the state of the art.
+Abstract: Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.
-===Joint Sparse Representation for Multimodal Biometric Recognition===
+Link: [http://arxiv.org/pdf/1507.07646v1.pdf paper]
-Speaker: [http://www.umiacs.umd.edu/~sshekha/ Sumit Shekhar] -- Date: November 14, 2013
-In this talk, I will present the work on feature-level fusion method for multimodal biometric recognition. Traditional methods for combining outputs from different modalities are based on score-level or decision-level fusion. Feature-level fusion can be more discriminative, but has hardly been explored due to challenges of different feature outputs and high feature dimensions. Here, I will present a framework using joint sparsity to combine information, and show its application to multimodal biometric recognition, face recognition and vidoe-based recognition.
+===Automated Event Retrieval using Web Trained Detectors===
-===Renaissance of Convolutional Neural Network - what, why and so?===
+Speaker: [http://www.umiacs.umd.edu/~xintong/ Xintong Han] -- Date: December 10, 2015
-Speaker: [http://www.umiacs.umd.edu/~jhchoi/ Jonghyun Choi] -- Date: November 21, 2013
-The convolutional neural network based deep networks recently improve image classification accuracy significantly over the state-of-the-art vision approaches. I will go through what the successful deep convolutional neural net looks like, why it is again popular now and on-going deep net research in other research groups. I will mostly go through the successful instance of deep convolutional neural net tuned by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton, published in NIPS 2012.
+Abstract: Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset.
-===Distance Learning Using the Triangle Inequality for Semi-supervised Clustering===
-Speaker: [http://www.umiacs.umd.edu/~arijit/ Arijit Biswas] -- Date: December 5, 2013
-Success of semi-supervised clustering algorithms depends on how effectively supervision can be propagated to the unsupervised data. We propose a method for modifying all pairwise image distances when must-link or can't-link pairwise constraints are provided for only a few image pairs. These distances are used for clustering images. First, we formulate a brute-force Quadratic Programming (QP) method that modifies the distances such that the total change in distances is minimized but the final distances obey the triangle inequality. Then we propose a much faster version of the QP that can be applied to large datasets by enforcing only a selected subset of the inequalities. We prove that this still ensures that key qualitative properties of the distances are correctly computed. We run experiments on face, leaf and video image clustering and show that our proposed approach outperforms state-of-the-art methods for constrained clustering.
+Link: [http://arxiv.org/pdf/1509.07845v1.pdf paper]
 ==Past Semesters==
+* [[Cvss:Spring2015| Spring 2015]]
+* [[cvss fall2014|Fall 2014]]
+* [[cvss_spring2014|Spring 2014]]
 * [[cvss_fall2013|Fall 2013]]
 * [[cvss_summer2013|Summer 2013]]
@@ Line 133: / Line 71: @@
 ==Funded By==
 * Computer Vision Faculty
-* [http://www.northropgrumman.com/ Northrop Grumman]
+<!-- * '''[http://www.northropgrumman.com/ Northrop Grumman]''' -->
 ==Current Seminar Series Coordinators==
@@ Line 140: / Line 78: @@
 {| cellpadding="1"
+|-
+| [http://sites.google.com/site/austinomyers/ Austin Myers], amyers@
+| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 |-
 | [http://www.umiacs.umd.edu/~kanazawa/ Angjoo Kanazawa], kanazawa@
-| (student of [http://www.cs.umd.edu/~djacobs/ Professor David Jacobs])
+| (student of [http://cs.umd.edu/~djacobs/ Professor David Jacobs])
+|-
+| [http://sites.google.com/site/yechengxi/ Chenxi Ye] cxy@
+| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
 |-
-| [http://www.umiacs.umd.edu/~sameh/ Sameh Khamis], sameh@
+| [http://www.umiacs.umd.edu/~xintong/ Xintong Han], xintong@
 | (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 |-
-| [https://sites.google.com/site/austinomyers/ Austin Myers], amyers@
+| [http://www.cs.umd.edu/~bharat/ Bharat Singh], bharat@
-| (student of [http://www.cfar.umd.edu/~yiannis/ Professor Yiannis Aloimonos])
+| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 |-
-| Raviteja Vemulapalli, raviteja @
+| [http://bcsiriuschen.github.io/ Bor-Chun (Sirius) Chen], sirius@
-| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
+| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
 |}
 Gone but not forgotten.
 {| cellpadding="1"
+|-
+| [http://www.umiacs.umd.edu/~jhchoi/ Jonghyun Choi], jhchoi@
+| (student of [http://www.umiacs.umd.edu/~lsd/ Professor Larry Davis])
+|-
+| Ching-Hui Chen, ching@
+| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
+|
+|-
+| [http://ravitejav.weebly.com/ Raviteja Vemulapalli], raviteja @
+| (student of [http://www.umiacs.umd.edu/~rama/ Professor Rama Chellappa])
+|-
+| [http://www.umiacs.umd.edu/~sameh/ Sameh Khamis]
+|
 |-
 | [http://www.umiacs.umd.edu/~ejaz/ Ejaz Ahmed]
@@ Line 165: / Line 121: @@
 |-
 | [http://www.umiacs.umd.edu/~jni/ Jie Ni]
-| off this semester
+| now at Sony
 |-
 | [http://www.umiacs.umd.edu/~taheri/ Sima Taheri]

Anonymous

Search

Difference between revisions of "Main Page"

Namespaces

More

Page actions

Latest revision as of 23:40, 3 December 2015

Contents

Description[edit]

Schedule Fall 2015[edit]

Talk Abstracts Spring 2015[edit]

Learning 3D Deformation of Animals from 2D Images[edit]

Automated Event Retrieval using Web Trained Detectors[edit]

Past Semesters[edit]

Funded By[edit]

Current Seminar Series Coordinators[edit]

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "Main Page"

Latest revision as of 23:40, 3 December 2015

Description[edit]

Schedule Fall 2015[edit]

Talk Abstracts Spring 2015[edit]

Learning 3D Deformation of Animals from 2D Images[edit]

Automated Event Retrieval using Web Trained Detectors[edit]

Past Semesters[edit]

Funded By[edit]

Current Seminar Series Coordinators[edit]

Navigation

Wiki tools

Page tools