What is the information content of an algorithm?
Joachim M. Buhmann
, Machine Learning Laboratory in the Department of Computer Science at ETH Zurich
Date: Thursday, November 07, 2013
Time: 3:00 PM to 4:00 PM Note: all times are in the Eastern Time Zone
Location: Star Seminar Room - Bldg 32 "Stata" - MIT
Host: Tomaso Poggio, Lorenzo Rosasco , CBCL, BCS, CSAIL; LCSL, MIT-IIT
Contact: Kathleen Sullivan, email@example.com
Relevant URL: http://www.ml.inf.ethz.ch/people/professors/jbuhmann
Speaker URL: None
TALK: What is the information content of an algorithm?
Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to generalize over input fluctuations or randomization during execution. This talk elaborates a new framework to measure the "informativeness" of algorithmic procedures and their "stability" against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and thereby the pairwise comparisons are subject to fluctuations.
Created by Kathleen Sullivan at Monday, October 28, 2013 at 2:41 PM.