Monocular 3D Pose, Shape and Appearance Modeling
, Lund University and Google
Date: Friday, April 26, 2019
Time: 2:00 PM to 3:00 PM
Event Type: Seminar
Room Description: 32-D507
Host: Bill Freeman, MIT CSAIL
Contact: William T. Freeman, 617 253-8828, firstname.lastname@example.org
Speaker URL: None
TALK: Monocular 3D Pose, Shape and Appearance Modeling
Human sensing has greatly benefited from recent advances in deep learning, parametric human modeling, and the availability of large scale 2d and 3d datasets. However, existing models make strong assumptions about the scene, considering either a single person per image, full views of the person, a simple background or many cameras. In this talk I will describe approaches that combine deep multi-task neural networks and parametric human and scene models, towards an automatic monocular visual sensing system which infers the 2d and 3d pose and shape of multiple people from a single image, automatically integrates scene constraints, and extends to video by optimally solving the temporal assignment and pose smoothness problem while preserving image alignment fidelity.
Besides human sensing, the framework supports several modeling capabilities including realistic human appearance transfer as well as human behavior understanding. In this context I will introduce a new, fine-grained action classification and localization task defined on non-staged videos, recorded during robot-assisted therapy sessions of children with autism, and aimed at estimating their valence-arousal, automating therapy and quantitatively measuring progress.
Time permitting, I may briefly review ongoing work on deep reinforcement learning for visual recognition as well as our matrix backpropagation methodology that allows the training of deep structured models with layers implementing global operations like SVD, eigen-decomposition, projectors, or relaxations, and supporting, the end-to-end refinement of deep spectral clustering, higher-order pooling or graph-matching models.
This is joint work with C. Ionescu, E. Marinoiu, D. Nilsson, M. Zanfir, A. Popa, A. Pirinen, A. Zanfir.
Cristian Sminchisescu is a Professor at Lund University and a Research Scientist leading a team at Google Research. He has obtained a doctorate in computer science and applied mathematics from INRIA and has done postdoctoral research in the AI Lab at the University of Toronto. Prior to Lund he was a faculty in Chicago and Bonn, and has held affiliations at the University of Toronto and the Romanian Academy. He has been an associate editor for IEEE PAMI (since 2010), IJCV (since 2018), and a program chair for ECCV 2018 in Munich. His work has been funded by the NSF, the Swedish and German Science Foundations, the European Commission under a Marie Curie Excellence Grant, and the ERC under a Consolidator Grant. His work (with A. Zanfir) on deep learning for graph matching received a best paper award honorable mention at IEEE CVPR 2018.
AI & Machine Learning, Graphics & Vision, Human-Computer Interaction
Created by William T. Freeman at Thursday, April 18, 2019 at 9:59 AM.