Line 43: |
Line 43: |
| |- | | |- |
| | October 10 | | | October 10 |
− | | Abhishek Sharma | + | | Yezhou Yang |
− | | A Sentence is Worth a Thousand Pixels | + | | A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection |
| |- | | |- |
| | October 17 | | | October 17 |
Line 51: |
Line 51: |
| |- | | |- |
| | October 24 | | | October 24 |
− | | ''(CVPR deadline, no meeting)'' | + | | Abhishek Sharma |
− | | | + | | A Sentence is Worth a Thousand Pixels |
| |- | | |- |
| | October 31 | | | October 31 |
Line 85: |
Line 85: |
| | | |
| In this project we introduce a new method for learning image prior that can be used for many applications in image reconstruction. We learn a generative model on natural image patches. Our generative model is similar to one in Gausian Mixture Model (GMM). The key idea of our approach is to force each component of our generative model to share the same set of basis vectors. This leads to a much faster inference at test time. We used image denoising as our test bed for this image prior learning. Our experimental results shows that we reached about 30x speed up over state-of-the-art method while getting slightly improvement in denoising accuracy. | | In this project we introduce a new method for learning image prior that can be used for many applications in image reconstruction. We learn a generative model on natural image patches. Our generative model is similar to one in Gausian Mixture Model (GMM). The key idea of our approach is to force each component of our generative model to share the same set of basis vectors. This leads to a much faster inference at test time. We used image denoising as our test bed for this image prior learning. Our experimental results shows that we reached about 30x speed up over state-of-the-art method while getting slightly improvement in denoising accuracy. |
| + | |
| + | ===A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection=== |
| + | Speaker: [http://www.umiacs.umd.edu/~yzyang/ Yezhou Yang] -- Date October 10, 2013 |
| + | |
| + | Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism. |
| | | |
| ===A Sentence is Worth a Thousand Pixels=== | | ===A Sentence is Worth a Thousand Pixels=== |
− | Speaker: [http://www.umiacs.umd.edu/~bhokaal/ Abhishek Sharma] -- Date: October 10, 2013 | + | Speaker: [http://www.umiacs.umd.edu/~bhokaal/ Abhishek Sharma] -- Date: October 24, 2013 |
| | | |
| We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models. | | We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models. |