Thesis Defense: Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures

Speaker: Yale Song , MIT CSAIL

Date: Wednesday, April 30, 2014

Time: 10:30 AM to 11:30 AM Note: all times are in the Eastern Time Zone

Refreshments: 10:15 AM

Public: Yes

Location: 32-D463 (Star)

Event Type:

Room Description:

Host: Professor Randall Davis, MIT CSAIL

Contact: Niranjala Manokharan , nira@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Thesis Defense: Video Content Analysis: Learning Spatio-Temporal and Multimodal Structures

Recognizing human actions and affects from video requires algorithms able to detect and track people, and recognize their behaviors from audio, visual, and textual information. However, existing algorithms struggle in part because of spatio-temporal variability of human behaviors and the complex dependency structure of multimodal data.

In this talk, I introduce two algorithms that address these challenges. The first constructs a hierarchical representation of video by iteratively summarizing the contents in a fine-to-coarse manner, and uses the representation to learn spatio-temporal structure of human action. The second obtains a sparse representation of multimodal signal by hierarchically decomposing the signal into parts that are shared across modalities and parts that are private to each modality. I show how these two algorithms produce the best current performance on tasks of human action recognition and affect analysis.

Committee: Randall Davis, William T. Freeman, John W. Fisher III, Louis-Philippe Morency

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Yale Song Email at Friday, April 25, 2014 at 4:11 PM.