Difference between revisions of "Main Page"
Line 60: | Line 60: | ||
| November 7 | | November 7 | ||
| Jingjing Zheng | | Jingjing Zheng | ||
− | | | + | | Cross-View Action Recognition Via a Transferable Dictionary Pair |
|- | |- | ||
| November 14 | | November 14 | ||
Line 101: | Line 101: | ||
We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models. | We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models. | ||
+ | ===Cross-View Action Recognition Via a Transferable Dictionary Pair=== | ||
+ | Speaker: [https://sites.google.com/site/jingjingzhengumd/ Jingjing Zheng] -- Date: November 7, 2013 | ||
+ | |||
+ | Discriminative appearance features are effective for recognizing actions in a fixed view, but generalize poorly to changes in viewpoint. We present a method for view-invariant action recognition based on sparse representations using a transferable dictionary pair. A transferable dictionary pair consists of two dictionaries that correspond to the source and target views respectively. The two dictionaries are learned simultaneously from pairs of videos taken at different views and aim to encourage each video in the pair to have the same sparse representation. Thus, the transferable dictionary pair links features between the two views that are useful for action recognition. Both unsupervised and supervised algorithms are presented for learning transferable dictionary pairs. Using the sparse representation as features, a classifier built in the source view can be directly transferred to the target view. We extend our approach to transferring an action model learned from multiple source views to one target view. We demonstrate the effectiveness of our approach on the multi-view IXMAS data set. Our results compare favorably to the the state of the art. | ||
==Past Semesters== | ==Past Semesters== |
Revision as of 21:39, 5 November 2013
Computer Vision Student Seminars
The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.
To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.
Description[edit]
The purpose of these talks is to:
- Encourage interaction between computer vision students;
- Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
- Provide an opportunity for computer vision students to receive feedback on their current research;
- Provide speaking opportunities for computer vision students.
The guidelines for the format are:
- An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
- The talks are meant to be casual and discussion is encouraged.
- Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.
Schedule Fall 2013[edit]
All talks take place on Thursdays at 4:30pm in AVW 3450.
Date | Speaker | Title |
---|---|---|
September 19 | Mohammad Rastegari | Fast Image Prior |
September 26 | (no meeting) | |
October 3 | (MSR talk, no meeting) | |
October 10 | Yezhou Yang | A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection |
October 17 | Garrett Warnell | Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera |
October 24 | Abhishek Sharma | A Sentence is Worth a Thousand Pixels |
October 31 | (CVPR deadline, no meeting) | |
November 7 | Jingjing Zheng | Cross-View Action Recognition Via a Transferable Dictionary Pair |
November 14 | Sumit Shekhar | TBA |
November 21 | Arunkumar Mohananchettiar | TBA |
November 28 | (Thanksgiving, no meeting) | |
December 5 | Arijit Biswas | TBA |
Talk Abstracts Fall 2013[edit]
Fast Image Prior[edit]
Speaker: Mohammad Rastegari -- Date: September 19, 2013
In this project we introduce a new method for learning image prior that can be used for many applications in image reconstruction. We learn a generative model on natural image patches. Our generative model is similar to one in Gausian Mixture Model (GMM). The key idea of our approach is to force each component of our generative model to share the same set of basis vectors. This leads to a much faster inference at test time. We used image denoising as our test bed for this image prior learning. Our experimental results shows that we reached about 30x speed up over state-of-the-art method while getting slightly improvement in denoising accuracy.
A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection[edit]
Speaker: Yezhou Yang -- Date October 10, 2013
Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism.
Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera[edit]
Speaker: Garrett Warnell -- Date: October 17, 2013
We extend the classical notion of visual saliency to multi-image data collected using a stationary pan-tilt-zoom (PTZ) camera. We show why existing saliency methods are not effective for this type of data, and propose ray saliency: a modified notion of visual saliency that utilizes knowledge of the imaging process in order to appropriately incorporate the context provided by multiple images. We present a practical, mosaic-free method by which to quantify and calculate ray saliency, and demonstrate its usefulness on PTZ imagery.
A Sentence is Worth a Thousand Pixels[edit]
Speaker: Abhishek Sharma -- Date: October 24, 2013
We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models.
Cross-View Action Recognition Via a Transferable Dictionary Pair[edit]
Speaker: Jingjing Zheng -- Date: November 7, 2013
Discriminative appearance features are effective for recognizing actions in a fixed view, but generalize poorly to changes in viewpoint. We present a method for view-invariant action recognition based on sparse representations using a transferable dictionary pair. A transferable dictionary pair consists of two dictionaries that correspond to the source and target views respectively. The two dictionaries are learned simultaneously from pairs of videos taken at different views and aim to encourage each video in the pair to have the same sparse representation. Thus, the transferable dictionary pair links features between the two views that are useful for action recognition. Both unsupervised and supervised algorithms are presented for learning transferable dictionary pairs. Using the sparse representation as features, a classifier built in the source view can be directly transferred to the target view. We extend our approach to transferring an action model learned from multiple source views to one target view. We demonstrate the effectiveness of our approach on the multi-view IXMAS data set. Our results compare favorably to the the state of the art.
Past Semesters[edit]
Funded By[edit]
- Computer Vision Faculty
- Northrop Grumman
Current Seminar Series Coordinators[edit]
Emails are at umiacs.umd.edu.
Angjoo Kanazawa, kanazawa@ | (student of Professor David Jacobs) |
Sameh Khamis, sameh@ | (student of Professor Larry Davis) |
Austin Myers, amyers@ | (student of Professor Yiannis Aloimonos) |
Raviteja Vemulapalli, raviteja @ | (student of Professor Rama Chellappa) |
Gone but not forgotten.
Ejaz Ahmed | |
Anne Jorstad | now at EPFL |
Jie Ni | off this semester |
Sima Taheri | |
Ching Lik Teo |