RECURSION IN RATIONAL AGENTS: FOUNDATIONS FOR SELF-MODIFYING AI

Speaker: ELIEZER YUDKOWSKY , RESEARCH FELLOW AT MACHINE INTELLIGENCE RESEARCH INSTITUTE

Date: Thursday, October 17, 2013

Time: 4:00 PM to 5:30 PM Note: all times are in the Eastern Time Zone

Refreshments: 3:45 PM

Public: Yes

Location: 32-123

Event Type:

Room Description:

Host: Scott Aaronson

Contact: Holly A Jones, hjones01@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: toc@csail.mit.edu, theory-seminars@csail.mit.edu

Reminder Subject: TALK: RECURSION IN RATIONAL AGENTS: FOUNDATIONS FOR SELF-MODIFYING AI

ABSTRACT: Reflective reasoning is a familiar but formally elusive aspect of human cognition. This issue comes to the forefront when we consider building AIs which model other sophisticated reasoners, or who might design other AIs which are as sophisticated as themselves. Mathematical logic, the best-developed contender for a formal language capable of reflecting on itself, is beset by impossibility results. Similarly, standard decision theories begin to produce counterintuitive or incoherent results when applied to agents with detailed self-knowledge. In this talk I will present some early results from workshops held by the Machine Intelligence Research Institute to confront these challenges.

The first is a formalization and significant refinement of Hofstadter's "superrationality," the (informal) idea that ideal rational agents can achieve mutual cooperation on games like the prisoner's dilemma by exploiting the logical connection between their actions and their opponent's actions. We show how to implement an agent, which reliably outperforms classical game theory given mutual knowledge of source code, and which achieves mutual cooperation in the one-shot prisoner's dilemma using a general procedure. Using a fast algorithm for finding fixed points, we are able to write implementations of agents that perform the logical interactions necessary for our formalization, and we describe empirical results.

It has been claimed that Godel's second incompleteness theorem presents a serious obstruction to any AI understanding why its own reasoning works or even trusting that it does work [Bringsjord, Mahoney]. We exhibit a simple model for this situation and show that straightforward solutions to this problem are indeed unsatisfactory, resulting in agents who are willing to trust weaker peers but not their own reasoning. We show how to circumvent this difficulty without compromising logical expressiveness.

Time permitting; we also describe a more general agenda for averting self-referential difficulties by replacing logical deduction with a suitable form of probabilistic inference. The goal of this program is to convert logical unprovability or undefinability into very small probabilistic errors which can be safely ignored (and may even be philosophically justified).

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Holly A Jones Email at Wednesday, October 09, 2013 at 1:32 PM.