New Problems and Perspectives on Learning, Testing, and Sampling in the Small Data Regime

Speaker: Greg Valiant

Date: Thursday, May 16, 2019

Time: 4:00 PM to 5:00 PM

Public: Yes

Location: 32-G463 (Star)

Event Type: Seminar

Room Description:

Host: Constantinos Daskalakis, MIT CSAIL

Contact: Rebecca Yadegar,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:,

Reminder Subject: TALK: Greg Valiant: New Problems and Perspectives on Learning, Testing, and Sampling in the Small Data Regime

Abstract: I will discuss several new problems related to the general challenge of understanding what conclusions can be made, given a dataset that is relatively small in comparison to the complexity or dimensionality of the underlying distribution from which it is drawn. In the first setting we consider the problem of learning a population of Bernoulli (or multinomial) parameters. This is motivated by the ``federated learning" setting where we have data from a large number of heterogeneous individuals, who each supply a very modest amount of data, and ask the extent to which the number of data sources can compensate for the lack of data from each source. Second, we will discuss the problem of estimating the ``learnability'' of a dataset: given too little labeled data to train an accurate model, we show that it is often possible to estimate the extent to which a good model exists. Specifically, given labeled data pairs (x, y) drawn from some unknown distribution over such pairs, it is possible to estimate how much of the variance of y can be explained via the best linear function of x, even in the regime where it is impossible to approximate that linear function. Finally, I will introduce the problem of data "amplification". Given n independent draws from a distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where n is too small to learn D to any nontrivial accuracy. We also discuss connections between this setting and the challenge of interpreting the behavior of GANs and other ML/AI systems. This talk will also highlight a number of concrete and more conceptual open directions in all three veins.

This work is based on several papers, with Weihao Kong, and with Brian Axelrod, Shivam Garg, and Vatsal Sharan.

Research Areas:
Algorithms & Theory

Impact Areas:

See other events that are part of the Theory of Computation Seminar (ToC) 2019.

Created by Rebecca Yadegar Email at Wednesday, May 08, 2019 at 6:24 PM.