Thesis Defense: Towards Object-based SLAM

Speaker: Yihao Zhang , MIT MechE

Date: Monday, May 06, 2024

Time: 10:00 AM to 11:30 AM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G882 (

Event Type: Thesis Defense

Room Description:

Host: John J. Leonard, MIT MechE

Contact: Yihao Zhang,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:,

Reminder Subject: TALK: Thesis Defense Yihao Zhang: Towards Object-based SLAM

Simultaneous localization and mapping (SLAM) is a fundamental capability for a robot to perceive its surrounding environment. The research area has developed for more than two decades from the original sparse landmark-based SLAM to dense SLAM, and now there is a demand for semantic understanding of the environment beyond pure geometric understanding. This thesis makes a number of contributions to help realize object-based SLAM, in which the map consists of a set of objects with their semantic categories recognized and their poses and shapes estimated. Such a map provides vital object-level semantic and geometric perception for applications such as augmented reality (AR), mixed reality (MR), mobile manipulation, and autonomous driving.

In order to perform object-based SLAM, the sensor measurements have to undergo a series of processes. First, objects are semantically segmented in the sensor measurements. This step is typically done by a neural network. As robots are often required to bootstrap from some initial labeled datasets and adapt to different environments where labeled data are unavailable, it is important to enable semi-supervised learning to improve the robot performance with the unlabeled data collected by the robot itself. Second, after the objects are segmented, measurements for each object across different viewpoints have to be associated together for downstream processing. Lastly, the robot must be able to extract the object pose and shape information from the measurements without access to the detailed CAD models of the objects. This thesis studies these three aspects of object-based SLAM, namely semi-supervised learning of semantic segmentation in a robotics context, data association for object-based SLAM, and category-level object pose and shape estimation.

For category-level object pose and shape estimation, we developed ShapeICP (ICP: iterative closest point), an algorithm that does not use pose-annotated data and generates meshes as the object shape representation. For data association, we developed DAF-SLAM (DAF: data association free) to estimate the associations in the back-end instead of relying on sensor-dependent front-end methods. For semi-supervised learning, we applied temporal semantic consistency inspired by the photometric consistency technique in the traditional SLAM methods. Each contribution is evaluated via experimental datasets, demonstrating improvements over previous techniques.

Committee Members:
John J. Leonard (Advisor), Department of Mechanical Engineering
Faez Ahmed, Department of Mechanical Engineering
Nicholas Roy, Department of Aeronautics and Astronautics

Research Areas:
AI & Machine Learning, Graphics & Vision, Robotics

Impact Areas:

This event is not part of a series.

Created by Yihao Zhang Email at Tuesday, April 16, 2024 at 11:55 PM.