Changes

Main Page (view source)

Revision as of 17:55, 14 August 2011

1,082 bytes added , 17:55, 14 August 2011

no edit summary

Line 77: Line 77:

|-

| August 18

−

|

+

| Yezhou Yang

−

|

+

| Corpus-Guided Sentence Generation of Natural Images

|-

| August 25

Line 144: Line 144:

We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary item. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with a compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes. More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition applications.

+

===Corpus-Guided Sentence Generation of Natural Images===

+

Speaker: [http://www.umiacs.umd.edu/~yzyang/ Yezhou Yang] -- Date: August 18, 2011

+

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies

+

that use vision alone.

Sameh

199

edits

Anonymous

Search

Changes

Namespaces

More

Page actions

Main Page (view source)

Revision as of 17:55, 14 August 2011

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Changes

Main Page (view source)

Revision as of 17:55, 14 August 2011

Navigation

Wiki tools

Page tools