Changes

752 bytes added ,  22:03, 14 October 2013
no edit summary
Line 48: Line 48:  
| October 17
 
| October 17
 
| Garrett Warnell
 
| Garrett Warnell
| TBA
+
| Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera
 
|-
 
|-
 
| October 24
 
| October 24
Line 90: Line 90:     
Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars  in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used  bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute  manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism.
 
Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars  in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used  bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute  manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism.
 +
 +
===Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera===
 +
Speaker: [http://garrettwarnell.com/ Garrett Warnell] -- Date: October 17, 2013
 +
 +
We extend the classical notion of visual saliency to multi-image data collected using a stationary pan-tilt-zoom (PTZ) camera. We show why existing saliency methods are not effective for this type of data, and propose ray saliency: a modified notion of visual saliency that utilizes knowledge of the imaging process in order to appropriately incorporate the context provided by multiple images. We present a practical, mosaic-free method by which to quantify and calculate ray saliency, and demonstrate its usefulness on PTZ imagery.
    
===A Sentence is Worth a Thousand Pixels===
 
===A Sentence is Worth a Thousand Pixels===
199

edits