cvss

Computer Vision Student Seminars

The Computer Vision Student Seminars at the University of Maryland College Park are a student-run series of talks given by current graduate students for current graduate students.

To receive regular information about the Computer Vision Student Seminars, subscribe to our mailing list or our talks list.

Description[edit]

The purpose of these talks is to:

Encourage interaction between computer vision students;
Provide an opportunity for computer vision students to be aware of and possibly get involved in the research their peers are conducting;
Provide an opportunity for computer vision students to receive feedback on their current research;
Provide speaking opportunities for computer vision students.

The guidelines for the format are:

An hour-long weekly meeting, consisting of one 20-40 minute talk followed by discussion and food.
The talks are meant to be casual and discussion is encouraged.
Topics may include current research, past research, general topic presentations, paper summaries and critiques, or anything else beneficial to the computer vision graduate student community.

Schedule Spring 2014[edit]

All talks take place on Thursdays at 3:30pm in AVW 3450.

Date	Speaker	Title
January 30	Arpit Jain	Scene and Video Understanding
February 6	Raviteja Vemulapalli	Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group
February 13	Google DC PhD Summit, no meeting
February 20	Varun Nagaraja	TBA
February 27	Mohammad Rastegari	TBA
March 6	ECCV deadline, no meeting
March 13	Xavier Gibert Serra	TBA
March 20	Spring break, no meeting
March 27	Swaminathan Sankaranarayanan	TBA
April 3	Austin Myers	TBA
April 10	Jing Jing	TBA
April 17	Kota Hara	TBA
April 24	Ejaz Ahmed	TBA
May 1	Garrett Warnell	TBA
May 8	Sameh Khamis	TBA
May 15	Sumit Sekhar	TBA

Talk Abstracts Spring 2014[edit]

Scene and Video Understanding[edit]

Speaker: Arpit Jain -- Date: January 30, 2014

There has been significant improvements in the accuracy of scene understanding due to a shift from recognizing objects ``in isolation to context based recognition systems. Such systems improve recognition rates by augmenting appearance based models of individual objects with contextual information based on pairwise relationships between objects. These pairwise relations incorporate common world knowledge such as co-occurences and spatial arrangements of objects, scene layout, etc. However, these relations, even though consistent in 3D world, change due to viewpoint of the scene. In this thesis, we will look into the problems of incorporating contextual information from two different perspective for scene understanding problem (a) ``what contextual relations are useful and ``how they should be incorporated into Markov network during inference. (b) jointly solve the segmentation and recognition problem using a multiple segmentation framework based on contextual information in conjunction with appearance matching. In the later part of the thesis, we will investigate different representations for video understanding and propose a discriminative patch based representation for videos.

Our work depart from traditional view of incorporating context into scene understanding problem where a fixed model for context is learned. We argue that context is scene dependent and propose a data-driven approach to predict the importance of edges and construct a Markov network for image analysis based on statistical models of global and local image features. Since all contextual information are not equally important, we also address the coupled problem of predicting the feature weights associated with each edge of a Markov network for evaluation of context. We then address the problem of fixed segmentation while modelling context by using a multiple segmentation framework and formulating the problem as ``a jigsaw puzzle. We formulate the problem as segment selection from a pool of segments (jigsaws), assigning each selected segment a class label. Previous multiple segmentation approaches used local appearance matching to select segments in a greedy manner. In contrast, our approach formulates a cost function based on contextual information in conjunction with appearance matching. This relaxed cost function formulation is minimized using an efficient quadratic programming solver and an approximate solution is obtained by discretizing the relaxed solution.

Lastly, we propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification.

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group[edit]

Speaker: Raviteja Vemulapalli -- Date: February 6, 2014

Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have resulted in a renewed interest in skeleton-based human action recognition. Most of the earlier skeleton-based approaches used either the joint locations or the joint angles to represent a human skeleton. In this paper, we propose a new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space. Since 3D rigid body motions are members of the special Euclidean group SE(3), the proposed skeletal representation lies in the Lie group SE(3)×. . .×SE(3), which is a curved manifold. With the proposed representation human actions can be modeled as curves in this Lie group. Since classification of curves in this Lie group is not an easy task, we map the action curves from the Lie group to its Lie algebra, which is a vector space. We then perform classification using a combination of dynamic time warping, Fourier temporal pyramid representation and linear SVM. Experimental results on three action datasets show that the proposed representation performs better than various other commonly-used skeletal representations. The proposed approach also outperforms various state-of-the-art skeleton-based human action recognition approaches.

Past Semesters[edit]

Funded By[edit]

Computer Vision Faculty
Northrop Grumman

Current Seminar Series Coordinators[edit]

Emails are at umiacs.umd.edu.

Angjoo Kanazawa, kanazawa@	(student of Professor David Jacobs)
Sameh Khamis, sameh@	(student of Professor Larry Davis)
Austin Myers, amyers@	(student of Professor Yiannis Aloimonos)
Raviteja Vemulapalli, raviteja @	(student of Professor Rama Chellappa)

Gone but not forgotten.

Ejaz Ahmed
Anne Jorstad	now at EPFL
Jie Ni	off this semester
Sima Taheri
Ching Lik Teo

Web Accessibility

Anonymous

Search

Main Page

Namespaces

More

Page actions

Contents

Description[edit]

Schedule Spring 2014[edit]

Talk Abstracts Spring 2014[edit]

Scene and Video Understanding[edit]

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group[edit]

Past Semesters[edit]

Funded By[edit]

Current Seminar Series Coordinators[edit]

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Main Page

Description[edit]

Schedule Spring 2014[edit]

Talk Abstracts Spring 2014[edit]

Scene and Video Understanding[edit]

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group[edit]

Past Semesters[edit]

Funded By[edit]

Current Seminar Series Coordinators[edit]

Navigation

Wiki tools

Page tools