Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators

Speaker: Tim Brooks , OpenAI

Date: Tuesday, April 23, 2024

Time: 12:00 PM to 1:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: https://mit.zoom.us/j/95167636032?pwd=U0dyaEx1a3A3QkZrbmIvMkcvUFkyUT09 (password: mitvc)

Event Type: Seminar

Room Description:

Host: Yang Liu, MIT EECS & CSAIL

Contact: Yang Liu, yliu@csail.mit.edu

Relevant URL:

Speaker URL: https://www.timothybrooks.com/

Speaker Photo:
Tim brooks portrait

Reminders to: seminars@csail.mit.edu, vc-all@csail.mit.edu, graphics@csail.mit.edu, vgn@csail.mit.edu, vision-meeting@csail.mit.edu

Reminder Subject: TALK: Visual Computing Seminar | Tim Brooks - Sora: Video Generation Models as World Simulators

Virtual session of MIT Visual Computing Seminar, Spring 2024 featuring invited speaker (remote) Tim Brooks from OpenAI.

The format is ~25 min of talk followed by Q&A. Considering the potential capacity of the talk, we use slido for live Q&A and answer top questions from the upvote queue. [live Q&A link] https://tinyurl.com/TimBrooksMIT

Please DO NOT record this talk by any means. Thanks for your understanding.

Title
Sora: Video Generation Models as World Simulators

Abstract
We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Bio
Tim Brooks is a research scientist at OpenAI where he co-leads Sora, their video generation model. His research investigates large-scale generative models that simulate the physical world. Tim received a PhD at Berkeley AI Research advised by Alyosha Efros, where he invented InstructPix2Pix. He previously worked on AI that powers the Pixel phone's camera at Google and on video generation models at NVIDIA.

Research Areas:
AI & Machine Learning, Graphics & Vision, Human-Computer Interaction, Robotics

Impact Areas:
Big Data, Education, Entertainment

This event is not part of a series.

Created by Yang Liu Email at Tuesday, April 09, 2024 at 1:56 PM.