RECURSION IN RATIONAL AGENTS: FOUNDATIONS FOR SELF-MODIFYING AI
, RESEARCH FELLOW AT MACHINE INTELLIGENCE RESEARCH INSTITUTE
Date: Thursday, October 17, 2013
Time: 4:00 PM to 5:30 PM Note: all times are in the Eastern Time Zone
Refreshments: 3:45 PM
Host: Scott Aaronson
Contact: Holly A Jones, firstname.lastname@example.org
Speaker URL: None
TALK: RECURSION IN RATIONAL AGENTS: FOUNDATIONS FOR SELF-MODIFYING AI
ABSTRACT: Reflective reasoning is a familiar but formally elusive aspect of human cognition. This issue comes to the forefront when we consider building AIs which model other sophisticated reasoners, or who might design other AIs which are as sophisticated as themselves. Mathematical logic, the best-developed contender for a formal language capable of reflecting on itself, is beset by impossibility results. Similarly, standard decision theories begin to produce counterintuitive or incoherent results when applied to agents with detailed self-knowledge. In this talk I will present some early results from workshops held by the Machine Intelligence Research Institute to confront these challenges.
The first is a formalization and significant refinement of Hofstadter's "superrationality," the (informal) idea that ideal rational agents can achieve mutual cooperation on games like the prisoner's dilemma by exploiting the logical connection between their actions and their opponent's actions. We show how to implement an agent, which reliably outperforms classical game theory given mutual knowledge of source code, and which achieves mutual cooperation in the one-shot prisoner's dilemma using a general procedure. Using a fast algorithm for finding fixed points, we are able to write implementations of agents that perform the logical interactions necessary for our formalization, and we describe empirical results.
It has been claimed that Godel's second incompleteness theorem presents a serious obstruction to any AI understanding why its own reasoning works or even trusting that it does work [Bringsjord, Mahoney]. We exhibit a simple model for this situation and show that straightforward solutions to this problem are indeed unsatisfactory, resulting in agents who are willing to trust weaker peers but not their own reasoning. We show how to circumvent this difficulty without compromising logical expressiveness.
Time permitting; we also describe a more general agenda for averting self-referential difficulties by replacing logical deduction with a suitable form of probabilistic inference. The goal of this program is to convert logical unprovability or undefinability into very small probabilistic errors which can be safely ignored (and may even be philosophically justified).
Created by Holly A Jones at Wednesday, October 09, 2013 at 1:32 PM.