You can't get more paperclips that way

by Eliezer Yudkowsky May 25 2016

Most arguments that "A paperclip maximizer could get more paperclips by (doing nice things)" are flawed.

Instrumental convergence says that various properties $~$P$~$ of an agent, often scary or detrimental-by-default properties like "trying to gain control of lots of resources" or "deceiving humans into thinking you are nice", will fall out of pursuing most utility functions $~$U.$~$ You might be tempted to hope that nice or reassuring properties $~$P$~$ would also fall out of most utility functions $~$U$~$ in the same natural way. In fact, your brain might tempted to treat Clippy the Paperclip Maximizer as a political agent you were trying to cleverly persuade, and come up with clever arguments for why Clippy should do things your way in order to get more paperclips, like trying to persuade your boss why you ought to get a raise for the good of the company.

The problem here is that:

For example:

• Your brain instinctively tries to persuade this imaginary Clippy to keep humans around by arguing, "If you keep us around as economic partners and trade with us, we can produce paperclips for you under Ricardo's Law of Comparative Advantage!" This is then the policy $~$\pi_1$~$ which would indeed produce some paperclips, but what would produce even more paperclips is the policy $~$\pi_2$~$ of disassembling the humans into spare atoms and replacing them with optimized paperclip-producers.

• Your brain tries to persuade an imaginary Clippy by arguing for policy $~$\pi_1,$~$ "Humans have a vast amount of varied life experience; you should keep us around and let us accumulate more experience, in case our life experience lets us make good suggestions!" This would produce some expected paperclips, but what would produce more paperclips is policy $~$\pi_2$~$ of "Disassemble all human brains and store the information in an archive, then simulate a much larger variety of agents in a much larger variety of circumstances so as to maximize the paperclip-relevant observations that could be made."

An unfortunate further aspect of this situation is that, in cases like this, your brain may be tempted to go on arguing for why really $~$\pi_2$~$ isn't all that great and $~$\pi_1$~$ is actually better, just like if your boss said "But maybe this company will be even better off if I spend that money on computer equipment" and your brain at once started to convince itself that computing equipment wasn't all that great and higher salaries were much more important for corporate productivity. (As Robert Trivers observed, deception of others often begins with deception of self, and this fact is central to understanding why humans evolved to think about politics the way we did.)

But since you don't get to see Clippy discarding your clever arguments and just turning everything in reach into paperclips - at least, not yet - your brain might hold onto its clever and possibly self-deceptive argument for why the thing you want is really the thing that produces the most paperclips.

Possibly helpful mental postures: