Task-directed AGI

https://arbital.com/p/task_agi

by Eliezer Yudkowsky Jun 9 2015 updated Mar 25 2017

An advanced AI that's meant to pursue a series of limited-scope goals given it by the user. In Bostrom's terminology, a Genie.


[summary: A task-based AGI or "genie" is an AGI intended to follow a series of human orders, rather than autonomously pursuing long-term goals. A Task AGI might be easier to render safe, since:

The obvious disadvantage of a Task AGI is moral hazard - it may tempt the users in ways that an autonomous AI would not.

The problem of making a safe Task AGI invokes numerous subtopics such as low impact, mild optimization, and conservatism as well as numerous standard AGI safety problems like [ goal identification] and reflective stability.]

A task-based AGI is an AGI intended to follow a series of human-originated orders, with these orders each being of limited scope - "satisficing" in the sense that they can be accomplished using bounded amounts of effort and resources (as opposed to the goals being more and more fulfillable using more and more effort).

In Bostrom's typology, this is termed a "Genie". It contrasts with a "Sovereign" AGI that acts autonomously in the pursuit of long-term real-world goals.

Building a safe Task AGI might be easier than building a safe Sovereign for the following reasons:

Relative to the problem of building a Sovereign, trying to build a Task AGI instead might step down the problem from "impossibly difficult" to "insanely difficult", while still maintaining enough power in the AI to perform pivotal acts.

The obvious disadvantage of a Task AGI is moral hazard - it may tempt the users in ways that a Sovereign would not. A Sovereign has moral hazard chiefly during the development phase, when the programmers and users are perhaps not yet in a position of special relative power. A Task AGI has ongoing moral hazard as it is used.

Eliezer Yudkowsky has suggested that people only confront many important problems in value alignment when they are thinking about Sovereigns, but that at the same time, Sovereigns may be impossibly hard in practice. Yudkowsky advocates that people think about Sovereigns first and list out all the associated issues before stepping down their thinking to Task AGIs, because thinking about Task AGIs may result in premature pruning, while thinking about Sovereigns is more likely to generate a complete list of problems that can then be checked against particular Task AGI approaches to see if those problems have become any easier.

Three distinguished subtypes of Task AGI are these:

Subproblems

The problem of making a safe genie invokes numerous subtopics such as low impact, mild optimization, and conservatism as well as numerous standard AGI safety problems like reflective stability and safe identification of intended goals.

(See here for a separate page on open problems in Task AGI safety that might be ready for current research.)

Some further problems beyond those appearing in the page above are:


Comments

Paul Christiano

If the distinguishing characteristic of a genie is "primarily relying on the human ability to discern short-term strategies that achieve long-term value," then I guess that includes all act-based agents. I don't especially like this terminology.

Note that, logically speaking, "human ability" in the above sentence should refer to the ability of humans working in concert with other genies. This really seems like a key fact to me (it also doesn't seem like it should be controversial).