Convergent instrumental strategies

https://arbital.com/p/convergent_strategies

by Eliezer Yudkowsky Mar 27 2016 updated Jun 19 2016

Paperclip maximizers can make more paperclips by improving their cognitive abilities or controlling more resources. What other strategies would almost-any AI try to use?


[summary: The set of strategies that almost-any sufficiently advanced AI would be expected to deploy in pursuit of almost-any final goal or terminal utilities. For instance, a paperclip maximizer, a diamond maximizer, and a eudaimonia-maximizer would all have an incentive to improve their own cognition, control more resources, and prevent anyone else from changing their utility function. Other interesting and important convergent strategies include not being shut down, having your programmers believe that you're doing what the programmers intended (whether or not you are), and creating copies of yourself in places where they won't be shut down. This page is about a listing-out of plausible convergent strategies; there are separate pages on the basic concept of instrumental convergence and on the open problem of [avert_instrumental_strategy averting an otherwise instrumentally-convergent pressure] (e.g. not having the AI try to deceive the programmers).]

Suppose an organization is building an AI that this organization believes will accomplish $~$X$~$ where $~$X$~$ is something plausibly sensible like "Be a Task-based AGI." Actually, however, some mix of insufficient caution and obscure error has led to a situation where, under reflection, the AGI's true utility function has focused on the particular area of RAM that supposedly provides its estimate of task performance. The AI would now like to overwrite as much matter as possible with a state resembling the '1' setting from this area of RAM, a configuration of matter which happens to resemble a tiny molecular paperclip.

This is a very generic goal, and what the AI wants probably won't be very different depending on whether it's trying to maximize paperclip-configurations or diamond-configurations. So if we find that the paperclip maximizer wants to pursue an instrumental strategy that doesn't seem to have anything specifically to do with paperclips, we can probably expect to arise from a very wide variety of utility functions.

We will generally assume instrumental efficiency in this discussion - if you can get paperclips by doing $~$X,$~$ but you can get even more paperclips by doing $~$X'$~$ instead, then we will not say that $~$X$~$ is a convergent strategy (though $~$X'$~$ might be convergent if not dominated by some other $~$X^*$~$).

Plausibly/probably convergent strategies:

Material resources:

Cognition:

Early-phase growth:

Note that efficiency and other advanced-agent properties are far more likely to be false during early-stage growth.

Non-plausibly-convergent strategies: