EECS Special Seminar: Vincent Sitzmann "Self-supervised Scene Representation Learning"

Speaker: Vincent Sitzmann , CSAIL

Date: Wednesday, March 03, 2021

Time: 11:00 AM to 12:00 PM Note: all times are in the Eastern Time Zone

Public: No

Location: Virtual TBD

Event Type: Seminar

Room Description:

Host: Aleksander Madry

Contact: Fern D Keniston,

Relevant URL:

Speaker URL: None

Speaker Photo:
Vincent sitzmann

Reminders to:

Reminder Subject: TALK: EECS Special Seminar: Vincent Sitzmann "Self-supervised Scene Representation Learning"

Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.
Focusing on 3D structure, a fundamental feature of natural scenes, I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will show how this approach unlocks the learning of priors, enabling 3D reconstruction from only a single posed 2D image, and how we may extend these representations to other modalities such as sound.
I will then discuss how these efforts advance us towards a unified scene representation learning backbone to applications across computer vision, computer graphics, robotics, and other applications of computer science, and what key challenges remain.

Vincent Sitzmann is a Postdoc with Joshua Tenenbaum, William Freeman, and Fredo Durand at MIT CSAIL. Previously, he finished his PhD at Stanford University. His primary research interests lie in the self-supervised learning of neural representations of 3D scenes, and their applications in computer graphics, computer vision, and robotics.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Fern D Keniston Email at Wednesday, February 24, 2021 at 1:38 PM.