- From Large to Small Dataset...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in March 2024
From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection
Speaker:
Ellen Vitercik
, Stanford University
Date: Tuesday, March 19, 2024
Time: 4:15 PM to 5:15 PM Note: all times are in the Eastern Time Zone
Public: Yes
Location: 32-G449
Event Type: Seminar
Room Description: Refreshments at 4:00pm
Host: Costis Daskalakis, CSAIL MIT
Contact: Joanne Talbot Hanley, 617-253-6054, joanne@csail.mit.edu
Relevant URL:
Speaker URL: None
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu, theory-seminars@csail.mit.edu
Reminder Subject:
TALK: TOC Seminar: Ellen Vitercik "From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection," Tuesday, March 19 at 4:15pm in 32-G449 (Refreshments at 4:00pm)
Abstract: In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an unknown ground-truth clustering that we can only access through expensive oracle queries. Ideally, the clustering algorithm's output will be structurally close to the ground truth. We approach this problem by introducing a notion of size generalization for clustering algorithm accuracy. We identify conditions under which we can (1) subsample the massive clustering instance, (2) evaluate a set of candidate algorithms on the smaller instance, and (3) guarantee that the algorithm with the best accuracy on the small instance will have the best accuracy on the original big instance. We verify these findings both theoretically and empirically.
This is joint work with Vaggos Chatziafratis and Ishani Karmarkar.
Research Areas:
Impact Areas:
Created by Joanne Talbot Hanley at Monday, March 11, 2024 at 9:27 AM.