[summary: A task-based AGI or "genie" is an AGI intended to follow a series of human orders, rather than autonomously pursuing long-term goals. A Task AGI might be easier to render safe, since:
- It's possible to query the user before and during a Task.
- Tasks are satisficing - they're of limited scope and can be fully accomplished using a limited effort. (In other words, Tasks should not become more and more accomplished as more and more effort is put into them.)
- Adequately identifying what it means to safely "cure cancer" might be simpler than adequately identifying all normative value.
- Task AGIs can be limited in various ways, rather than self-improving as far as possible, so long as they can still carry out at least some pivotal Tasks.
The obvious disadvantage of a Task AGI is moral hazard - it may tempt the users in ways that an autonomous AI would not.
The problem of making a safe Task AGI invokes numerous subtopics such as low impact, mild optimization, and conservatism as well as numerous standard AGI safety problems like [ goal identification] and reflective stability.]
A task-based AGI is an AGI intended to follow a series of human-originated orders, with these orders each being of limited scope - "satisficing" in the sense that they can be accomplished using bounded amounts of effort and resources (as opposed to the goals being more and more fulfillable using more and more effort).
In Bostrom's typology, this is termed a "Genie". It contrasts with a "Sovereign" AGI that acts autonomously in the pursuit of long-term real-world goals.
Building a safe Task AGI might be easier than building a safe Sovereign for the following reasons:
- A Task AGI can be "online"; the AGI can potentially query the user before and during Task performance. (Assuming an ambiguous situation arises, and is successfully identified as ambiguous.)
- A Task AGI can potentially be limited in various ways, since a Task AGI doesn't need to be as powerful as possible in order to accomplish its limited-scope Tasks. A Sovereign would presumably engage in all-out self-improvement. (This isn't to say Task AGIs would automatically not self-improve, only that it's possible in principle to limit the power of a Task AGI to only the level required to do the targeted Tasks, if the associated safety problems can be solved.)
- Tasks, by assumption, are limited in scope - they can be accomplished and done, inside some limited region of space and time, using some limited amount of effort which is then complete. (To gain this advantage, a state of Task accomplishment should not go higher and higher in preference as more and more effort is expended on it open-endedly.)
- Assuming that users can figure out intended goals for the AGI that are valuable and pivotal, the identification problem for describing what constitutes a safe performance of that Task, might be simpler than giving the AGI a [ complete description] of normativity in general. That is, the problem of communicating to an AGI an adequate description of "cure cancer" (without killing patients or causing other side effects), while still difficult, might be simpler than an adequate description of all normative value. Task AGIs fall on the narrow side of Ambitious vs. narrow value learning.
Relative to the problem of building a Sovereign, trying to build a Task AGI instead might step down the problem from "impossibly difficult" to "insanely difficult", while still maintaining enough power in the AI to perform pivotal acts.
The obvious disadvantage of a Task AGI is moral hazard - it may tempt the users in ways that a Sovereign would not. A Sovereign has moral hazard chiefly during the development phase, when the programmers and users are perhaps not yet in a position of special relative power. A Task AGI has ongoing moral hazard as it is used.
Eliezer Yudkowsky has suggested that people only confront many important problems in value alignment when they are thinking about Sovereigns, but that at the same time, Sovereigns may be impossibly hard in practice. Yudkowsky advocates that people think about Sovereigns first and list out all the associated issues before stepping down their thinking to Task AGIs, because thinking about Task AGIs may result in premature pruning, while thinking about Sovereigns is more likely to generate a complete list of problems that can then be checked against particular Task AGI approaches to see if those problems have become any easier.
Three distinguished subtypes of Task AGI are these:
- Oracles, an AI that is intended to only answer questions, possibly from some restricted question set.
- Known-algorithm AIs, which are not self-modifying or very weakly self-modifying, such that their algorithms and representations are mostly known and mostly stable.
- Behaviorist Genies, which are meant to not model human minds or model them in only very limited ways, while having great material understanding (e.g., potentially the ability to invent and deploy nanotechnology).
Subproblems
The problem of making a safe genie invokes numerous subtopics such as low impact, mild optimization, and conservatism as well as numerous standard AGI safety problems like reflective stability and safe identification of intended goals.
Some further problems beyond those appearing in the page above are:
- Oracle utility functions (that make the Oracle not wish to leave its box or optimize its programmers)
- Effable optimization (the opposite of cognitive uncontainability)
- Online checkability
- Explaining things to programmers without putting the programmers inside an argmax for how well you are 'explaining' things to them
- Transparency
- Do What I Mean
Comments
Paul Christiano
If the distinguishing characteristic of a genie is "primarily relying on the human ability to discern short-term strategies that achieve long-term value," then I guess that includes all act-based agents. I don't especially like this terminology.
Note that, logically speaking, "human ability" in the above sentence should refer to the ability of humans working in concert with other genies. This really seems like a key fact to me (it also doesn't seem like it should be controversial).