Interactive Object Recognition and Search over Mobile Video
Tiffany Yu-Han Chen
, CSAIL Ph.D. candidate
Date: Wednesday, May 10, 2017
Time: 1:00 PM to 2:00 PM Note: all times are in the Eastern Time Zone
Location: Star (32-G463)
Host: Hari Balakrishnan, CSAIL/EECS
Contact: Sheila M. Marian, 617-253-1996, firstname.lastname@example.org
Speaker URL: None
TALK: Intereactive Object Recognition and Search over Mobile Video
Cameras of good quality are now available on handheld and wearable mobile devices. The high resolution of these cameras coupled with pervasive wireless connectivity and advanced computer vision algorithms makes it feasible to develop new ways to interact with mobile video. Two important examples are interactive object recognition and search-by-content. Interactive recognition continuously locates objects in a video stream, recognizes them, and labels them with information associated with the objects in the users view. Example use cases include an augmented shopping application that recognizes products or brands to inform customers about the items they buy and a driver assistance application that recognizes vehicles and signs to improve driver safety. Interactive search-by-content allows users to discover videos using textual queries (e.g., child dog play). Instead of requiring broadcasters to manually annotate videos with meta-data tags, our search system uses vision algorithms to automatically produce textual tags.
These two services must be highly interactive because users expect timely feedback for their interactions and changes in content. However, achieving high interactivity without sacrificing accuracy or efficiency is challenging. The required computer vision algorithms use computationally intensive deep neural networks and must run at a frame rate of 30 frames per second. Recognizing an object scales with the size of the corpus of objects, and is infeasible on a mobile device. Off-loading recognition operations to servers introduces network and processing delay; when this delay is higher than a frame-time, it degrades recognition accuracy.
This dissertation presents two systems that study the trade-off between accuracy and efficiency for interactive recognition and search, and demonstrate how to achieve both goals. Glimpse enables interactive object recognition for camera-equipped mobile devices. Because the algorithms for object recognition entail significant computation, Glimpse runs them on servers across the network. To hide latency, Glimpse uses an active cache of video frames on the device and performs tracking on a subset of frames to correct the stale results obtained from the processing pipeline. Our results show that Glimpse achieves a precision of 90% for face recognition, which improves over a scheme performing server-side recognition without using an active cache by 2.8×. For fast moving objects such as road signs, Glimpse achieves precision up to 80%; without using the active cache, interactive recognition is non-functional (1.9% precision). Panorama enables search on live video streams. It introduces three new mechanisms: (1) an intelligent frame selector that reduces the number of frames on which expensive recognition must be run, (2) a distributed scheduler that uses feedback from the vision algorithms to dynamically determine the order in which streams must be processed, and (3) a search-ranking method that uses visual features to improve search relevance. Our experimental results show that incorporating visual features doubles search relevance from 45% to 90%. To achieve 90% search accuracy, with current pricing from Amazon Web Services, Panorama incurs 24× lower cost than a scheme that recognizes every frame.
Tiffany received a B.S. in Computer Science from National Taiwan University in 2010. She has worked as an intern at Microsoft Research, Cambridge Mobile Telematics, and Qualcomm. She is the recipient of the SIGMOBILE Research Highlight, the Jacobs Presidential Fellowship, the Google Anita Borg Memorial Scholarship, and the SenSys Best Presentation Award.
Committee members: Hari Balakrishnan (CSAIL), Dina Katabi (CSAIL), Victor Bahl (Microsoft Research).
Created by Sheila M. Marian at Thursday, May 04, 2017 at 5:55 PM.