On-Demand Evaluation for Information Extraction
, Stanford University
Date: Tuesday, May 30, 2017
Time: 2:00 PM to 3:00 PM
Location: 32-D463 (Stata Center - Star Conference Room)
Host: Regina Barzilay, MIT CSAIL
Contact: Marcia G. Davidson, 617-253-3049, firstname.lastname@example.org
Speaker URL: None
email@example.com, firstname.lastname@example.org, email@example.com
TALK: On-Demand Evaluation for Information Extraction
In the last few years we've seen a renewed interest in tackling hard language understanding problems, e.g. question answering, entailment and summarization. However, the subjective nature of these tasks makes evaluating them challenging. Unfortunately, as a community, we have adopted heuristic evaluation methodologies that may be misleading. In this talk, I'll argue how current evaluation methodology in knowledge base population, an information extraction task that incorporates elements of all the above language understanding problems, is biased and present a new evaluation paradigm that corrects this bias through statistical estimation and crowdsourcing.
In more detail:
Knowledge base population (KBP) systems take in a large document corpus and extract entities and their relations. Thus far, KBP evaluation has relied on judgements on the pooled predictions of existing systems. We show that this evaluation is problematic: when a new system predicts a previously unseen relation, it is penalized even if it is correct. This leads to significant bias against new systems, which counterproductively discourages innovation in the field. Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system's predictions on-demand via crowdsourcing. We show this eliminates bias and reduces variance using data from the 2015 TAC KBP task. Our second contribution is a KBP evaluation service and its annotations, which we make available to the community. We pilot the service by testing diverse state-of-the-art systems on the TAC KBP 2016 corpus and obtain accurate scores in a cost effective manner.
Created by Marcia G. Davidson at Monday, May 22, 2017 at 5:29 PM.