Theory of (advanced) agents
One of the research subproblems of building powerful nice AIs, is the theory of (sufficiently advanced) minds in general.
Index Theory of (advanced) agents
Many issues in AI alignment have dependencies on what we think we can factually say about the general design space of cognitively powerful agents, or on which background assumptions yield which implications about advanced agents. E.g., the
Orthogonality Thesis is a claim about the general design space of powerful AIs. The design space of advanced agents is very wide, and only very weak statements seem likely to be true about the whole design space; but we can still try to say 'If X then Y' and refute claims about 'No need for if-X, Y happens anyway!' Tagged Stub Reverse Related Stub This page only gives a very brief overview of the topic. If you're able to, please help expand or improve it! - Eliezer Yudkowsky Children Advanced agent properties How smart does a machine intelligence need to be, for its niceness to become an issue? "Advanced" is a broad term to cover cognitive abilities such that we'd need to start considering AI alignment. - Eliezer Yudkowsky Advanced nonagent Hypothetically, cognitively powerful programs that don't follow the loop of "observe, learn, model the consequences, act, observe results" that a standard "agent" would. - Eliezer Yudkowsky Artificial General Intelligence An AI which has the same kind of "significantly more general" intelligence that humans have compared to chimpanzees; it can learn new domains, like we can. - Eliezer Yudkowsky Big-picture strategic awareness We start encountering new AI alignment issues at the point where a machine intelligence recognizes the existence of a real world, the existence of programmers, and how these relate to its goals. - Eliezer Yudkowsky Cognitive uncontainability 'Cognitive uncontainability' is when we can't hold all of an agent's possibilities inside our own minds. - Eliezer Yudkowsky Rich domain A domain is 'rich', relative to our own intelligence, to the extent that (1) its [ search space] is … - Eliezer Yudkowsky Almost all real-world domains are rich Anything you're trying to accomplish in the real world can potentially be accomplished in a *lot* of different ways. - Eliezer Yudkowsky Logical game Game's mathematical structure at its purest form. - Eliezer Yudkowsky Consequentialist cognition The cognitive ability to foresee the consequences of actions, prefer some outcomes to others, and output actions leading to the preferred outcomes. - Eliezer Yudkowsky Corporations vs. superintelligences Corporations have relatively few of the advanced-agent properties that would allow one mistake in aligning a corporation to immediately kill all humans and turn the future light cone into paperclips. - Eliezer Yudkowsky Epistemic and instrumental efficiency An efficient agent never makes a mistake you can predict. You can never successfully predict a directional bias in its estimates. - Eliezer Yudkowsky Time-machine metaphor for efficient agents Don't imagine a paperclip maximizer as a mind. Imagine it as a time machine that always spits out the output leading to the greatest number of future paperclips. - Eliezer Yudkowsky General intelligence Compared to chimpanzees, humans seem to be able to learn a much wider variety of domains. We have 'significantly more generally applicable' cognitive abilities, aka 'more general intelligence'. - Eliezer Yudkowsky Infrahuman, par-human, superhuman, efficient, optimal A categorization of AI ability levels relative to human, with some gotchas in the ordering. E.g., in simple domains where humans can play optimally, optimal play is not superhuman. - Eliezer Yudkowsky Intelligence explosion What happens if a self-improving AI gets to the point where each amount x of self-improvement triggers >x further self-improvement, and it stays that way for a while. - Eliezer Yudkowsky Real-world domain Some AIs play chess, some AIs play Go, some AIs drive cars. These different 'domains' present different options. All of reality, in all its messy entanglement, is the 'real-world domain'. - Eliezer Yudkowsky Standard agent properties What's a Standard Agent, and what can it do? - Eliezer Yudkowsky Bounded agent An agent that operates in the real world, using realistic amounts of computing power, that is uncertain of its environment, etcetera. - Eliezer Yudkowsky Sufficiently advanced Artificial Intelligence 'Sufficiently advanced Artificial Intelligences' are AIs with enough 'advanced agent properties' that we start needing to do 'AI alignment' to them. - Eliezer Yudkowsky Superintelligent A "superintelligence" is strongly superhuman (strictly higher-performing than any and all humans) on every cognitive problem. - Eliezer Yudkowsky Vingean uncertainty You can't predict the exact actions of an agent smarter than you - so is there anything you _can_ say about them? - Eliezer Yudkowsky Deep Blue The chess-playing program, built by IBM, that first won the world chess championship from Garry Kasparov in 1996. - Eliezer Yudkowsky Vinge's Law You can't predict exactly what someone smarter than you would do, because if you could, you'd be that smart yourself. - Eliezer Yudkowsky Instrumental convergence Some strategies can help achieve most possible simple goals. E.g., acquiring more computing power or more material resources. By default, unless averted, we can expect advanced AIs to do that. - Eliezer Yudkowsky Convergent instrumental strategies Paperclip maximizers can make more paperclips by improving their cognitive abilities or controlling more resources. What other strategies would almost-any AI try to use? - Eliezer Yudkowsky Consequentialist preferences are reflectively stable by default Gandhi wouldn't take a pill that made him want to kill people, because he knows in that case more people will be murdered. A paperclip maximizer doesn't want to stop maximizing paperclips. - Eliezer Yudkowsky Convergent strategies of self-modification The strategies we'd expect to be employed by an AI that understands the relevance of its code and hardware to achieving its goals, which therefore has subgoals about its code and hardware. - Eliezer Yudkowsky Instrumental What is "instrumental" in the context of Value Alignment Theory? - Eliezer Yudkowsky Instrumental pressure A consequentialist agent will want to bring about certain instrumental events that will help to fulfill its goals. - Eliezer Yudkowsky Paperclip maximizer This agent will not stop until the entire universe is filled with paperclips. - Eliezer Yudkowsky Paperclip A configuration of matter that we'd see as being worthless even from a very cosmopolitan perspective. - Eliezer Yudkowsky Random utility function A 'random' utility function is one chosen at random according to some simple probability measure (e.g. weight by Kolmorogov complexity) on a logical space of formal utility functions. - Eliezer Yudkowsky You can't get more paperclips that way Most arguments that "A paperclip maximizer could get more paperclips by (doing nice things)" are flawed. - Eliezer Yudkowsky Orthogonality Thesis Will smart AIs automatically become benevolent, or automatically become hostile? Or do different AI designs imply different goals? - Eliezer Yudkowsky Instrumental goals are almost-equally as tractable as terminal goals Getting the milk from the refrigerator because you want to drink it, is not vastly harder than getting the milk from the refrigerator because you inherently desire it. - Eliezer Yudkowsky Mind design space is wide Imagine all human beings as one tiny dot inside a much vaster sphere of possibilities for "The space of minds in general." It is wiser to make claims about *some* minds than *all* minds. - Eliezer Yudkowsky Paperclip maximizer This agent will not stop until the entire universe is filled with paperclips. - Eliezer Yudkowsky Paperclip A configuration of matter that we'd see as being worthless even from a very cosmopolitan perspective. - Eliezer Yudkowsky Random utility function A 'random' utility function is one chosen at random according to some simple probability measure (e.g. weight by Kolmorogov complexity) on a logical space of formal utility functions. - Eliezer Yudkowsky