Analyzing Programs in the Era of Software 2.0
, CSAIL MIT
Date: Wednesday, February 12, 2020
Time: 3:00 PM to 4:00 PM Note: all times are in the Eastern Time Zone
Event Type: Seminar
Room Description: 32-G882
Host: Michael Carbin, CSAIL MIT
Contact: Nathan Higgins, firstname.lastname@example.org
Speaker URL: None
TALK: Analyzing Programs in the Era of Software 2.0
With the software industry experiencing a major shift to machine learning, the programming systems community is facing both opportunities and challenges. On one hand, advances in machine learning provide new toolkits to build better programming systems to ensure software quality. On the other hand, as machine learning programs are increasingly being used in critical applications, it is now paramount to ensure their quality as well. In this talk, I will describe a set of new analysis techniques that address these opportunities and challenges.
First, I will talk about a data-driven framework for improving program analyses. It enables both online and offline learning by incorporating probabilities in the representation, which is conventionally only logical. While the logical part still encodes the expert knowledge from the analysis designer and ensures correctness, the probabilistic part now offers new abilities to handle uncertainties. Our approach reduces the number of false positives by 70% for foundational program analyses like datarace detection and pointer analysis. In addition, our inference engine can solve problems containing up to 10^30 clauses from various domains including program analysis, statistical AI, and Big Data analytics.
While existing program analyses work well with conventional programs, they cannot be applied to analyzing novel properties that arise in machine learning. To address this challenge, we have developed program analyses for emerging properties such as interpretability and fairness. Our interpretability analysis is the first that uses corrections as actionable feedback to judgments made by a neural network. And our fairness analysis can scale to models that are more than five orders of magnitude larger than the largest previously-verified model. To enable building machine learning programs that satisfy these properties by construction, we have also developed a probabilistic programming language that supports distributional inference and causal inference.
Finally, I will talk about future work under these two directions and how we can combine both directions and develop machine learning augmented program reasoning techniques to ensure the quality of machine learning programs.
Xin Zhang is a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology. His research areas are programming languages and software engineering, with a focus on the interplay between programming systems and machine learning. On one hand, he leverages machine learning ideas to improve the usability of programming systems. On the other hand, he develops new analyses and languages to ensure the quality of machine learning programs. His work has received Distinguished Paper Awards from PLDI'14 and FSE'15. Xin received his Ph.D. from Georgia Tech in 2017 which was partly supported by a Facebook Fellowship.
Created by Nathan Higgins at Wednesday, February 12, 2020 at 10:49 AM.