Anonymous

Changes

From cvss
1,920 bytes added ,  04:02, 3 March 2015
no edit summary
Line 40: Line 40:  
| March 5
 
| March 5
 
| Yezhou Yang
 
| Yezhou Yang
| TBD
+
| Grasp Type Revisited: A Modern Perspective on A Classical Feature for Vision and Robotics
 
|-
 
|-
 
| March 12
 
| March 12
Line 95: Line 95:  
Abstract: We present an approach to jointly learn a set of view-specific dictionaries and a common dictionary for cross-view action recognition. The set of  view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using  both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the view-specific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in view-specific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from  unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.
 
Abstract: We present an approach to jointly learn a set of view-specific dictionaries and a common dictionary for cross-view action recognition. The set of  view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using  both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the view-specific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in view-specific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from  unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.
    +
===Grasp Type Revisited: A Modern Perspective on A Classical Feature for Vision and Robotics===
 +
Speaker: [http://www.umiacs.umd.edu/~yzyang/ Yezhou Yang] -- Date: March 5, 2015
 +
 +
Abstract: Our ability to interpret other people's actions hinges crucially on predictions about their intentionality. The grasp type provides crucial information about human action. However, recognizing the grasp type from unconstrained scenes is challenging because of the large variations in appearance, occlusions and  geometric distortions. In this paper, first we present a convolutional neural network to classify functional hand grasp types. Experiments on a public static scene hand data set validate good performance of the presented method. Then we present two applications utilizing grasp type classification: (a) inference of human action intention and (b) fine level manipulation action segmentation.
 +
Experiments on both tasks demonstrate the usefulness of grasp type as a cognitive feature for computer vision. Furthermore, we will present a system that learns manipulation action plans by processing Youtube cooking instructional videos with the grasp type feature. Its goal is to robustly generate the  sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots, and further guide it to execute the task.
    +
Related Papers:
 +
[http://www.umiacs.umd.edu/~yzyang/paper/CVPR2015Grasp_draft.pdf Grasp Type Revisited: A Modern Perspective on A Classical Feature for Vision (To appear CVPR'15)]
 +
[http://www.umiacs.umd.edu/~yzyang/paper/YouCookMani_CameraReady.pdf Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web (AAAI'15)]
 +
[http://www.umiacs.umd.edu/~yzyang/paper/VSS_action_intention.pdf Does the grasp type reveal action intention? (To appear VSS'15)]
 
==Past Semesters==
 
==Past Semesters==
 
* [[cvss fall2014|Fall 2014]]
 
* [[cvss fall2014|Fall 2014]]
77

edits