Changes

1,758 bytes added ,  17:48, 25 March 2013
Line 134: Line 134:     
Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. "I think this is a forest, what do you think?''. If the learner is wrong, the supervisor provides an explanation e.g. "No, this is too open to be a forest''. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this talk, we talk about three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.
 
Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. "I think this is a forest, what do you think?''. If the learner is wrong, the supervisor provides an explanation e.g. "No, this is too open to be a forest''. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this talk, we talk about three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.
 +
 +
===Large Scale Image and Video Geo-localization Using Street View Imagery===
 +
Speaker: [http://www.cs.ucf.edu/~aroshan/ Amir R. Zamir] : March 26, 2013
 +
 +
There has been a growing interest in large scale visual geo-localization methods which utilize structured reference datasets such as Street View. In this talk, we present two frameworks for geo-locating images and videos in a city scale with an accuracy comparable to hand-held GPS devices.
 +
 +
We extract SIFT features from the reference dataset and index them in a data structure tree to make a timely search feasible. To geo-locate a query image, we match its SIFT features against the reference tree and remove the unreliable correspondences using our geo-spatial pruning which incorporates GPS information in the pruning process. We smooth the distribution function formed using the location of the SIFT matches by a two-dimensional Gaussian to utilize the correspondences from nearby places and find the accurate location of the query image.
 +
 +
Additionally, we present a method for extracting the geo-spatial trajectory of a moving camera in a city from videos in the wild such as typical YouTube clips. First, we divide the video into smaller segments and localize each one individually. Then, we fuse the information from different segments utilizing a Bayesian formulation to have a temporally consistent trajectory. Lastly, we perform a post processing by a novel non-model-based trajectory reconstruction method based on Minimum Spanning trees; we argue that such post processing is essential for addressing the problems that the basic Bayesian formulation faces due to having a predefined underlying motion model, while the motion of camera in wild videos does not necessary follow any pattern.
    
==Past Semesters==
 
==Past Semesters==
50

edits