Grounding Language Acquisition by Training Semantic Parsers using Captioned Videos

Speaker: Candace Ross , MIT CSAIL

Date: Thursday, April 04, 2019

We develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient, requiring little annotation, and similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences. It does so despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. For this task, we collected a new dataset for grounded language acquisition. Learning a grounded semantic parser — turning sentences into logical forms using captioned videos — can significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.


About CompLang:
CompLang is a student-run discussion group on language and computation. The aim of the group is to bring together the language community at MIT and nearby, learn about each other's research, and foster cross-laboratory collaborations. The broad topic of the meetings is using computational models to study scientific questions about language. We will discuss work from computational linguistics, psycholinguistics, cognitive science, natural language processing and formal linguistics. Please visit for future events.

