Crowdsourcing: Quality Assurance and Connections with Machine Learning

Speaker: Panos Ipeirotis , NYU

Date: Friday, December 09, 2011

Time: 1:00 PM to 2:00 PM

Location: Patil/Kiva Seminar Room (32-G449)

Host: Rob Miller, MIT CSAIL

I will discuss the acquisition of "labels" for data items when the
labeling is imperfect. Labels are values provided by humans for
specified variables on data items, such as "PG-13" for "Adult Content
Rating on this Web Page." With the increasing popularity of
micro-outsourcing systems, such as Amazon's Mechanical Turk, it often
is possible to obtain less-than-expert labeling at low cost. I will
present strategies of managing quality in a crowdsourcing environment,
showing in parallel how to integrate data acquisition with the process
of learning machine learning models. I illustrate the results using
real-life applications from on-line advertising: leveraging
Mechanical Turk to help classify web pages as being objectionable to
advertisers. Time permitting, I will also discuss our latest results
showing that mice and Mechanical Turk workers are not that different
after all.

Bio: Panos Ipeirotis is an Associate Professor at the
Department of Information, Operations, and Management Sciences at the
Stern School of Business of New York University. His recent research
interests focus on crowdsourcing and on mining user-generated content
on the Internet. He received his Ph.D. in Computer Science from
Columbia University in 2004, with distinction. He has received three
"Best Paper" awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two
"Best Paper Runner Up" awards (JCDL 2002, ACM KDD 2008), and is also a
recipient of a CAREER award from the National Science Foundation. He
also maintains the blog "A Computer Scientist in a Business School"
where he blogs about crowdsourcing, user-generated content, and other
random facts, and his blogging activity seems to generate more
interest and recognition than any of the other activities mentioned in
this bio.

