On-Demand Evaluation for Information Extraction

Speaker: Arun Chaganty , Stanford University

Date: Tuesday, May 30, 2017

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-D463 (Stata Center - Star Conference Room)

Event Type:

Room Description:

Host: Regina Barzilay, MIT CSAIL

Contact: Marcia G. Davidson, 617-253-3049, marcia@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to: seminars@csail.mit.edu, rbg@csail.mit.edu, sls@csail.mit.edu

Reminder Subject: TALK: On-Demand Evaluation for Information Extraction

In the last few years we've seen a renewed interest in tackling hard language understanding problems, e.g. question answering, entailment and summarization. However, the subjective nature of these tasks makes evaluating them challenging. Unfortunately, as a community, we have adopted heuristic evaluation methodologies that may be misleading. In this talk, I'll argue how current evaluation methodology in knowledge base population, an information extraction task that incorporates elements of all the above language understanding problems, is biased and present a new evaluation paradigm that corrects this bias through statistical estimation and crowdsourcing.

In more detail:
Knowledge base population (KBP) systems take in a large document corpus and extract entities and their relations. Thus far, KBP evaluation has relied on judgements on the pooled predictions of existing systems. We show that this evaluation is problematic: when a new system predicts a previously unseen relation, it is penalized even if it is correct. This leads to significant bias against new systems, which counterproductively discourages innovation in the field. Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system's predictions on-demand via crowdsourcing. We show this eliminates bias and reduces variance using data from the 2015 TAC KBP task. Our second contribution is a KBP evaluation service and its annotations, which we make available to the community. We pilot the service by testing diverse state-of-the-art systems on the TAC KBP 2016 corpus and obtain accurate scores in a cost effective manner.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Marcia G. Davidson Email at Monday, May 22, 2017 at 5:29 PM.