Optimization daemons

https://arbital.com/p/daemons

by Eliezer Yudkowsky Mar 22 2016 updated Mar 28 2016

When you optimize something so hard that it crystalizes into an optimizer, like the way natural selection optimized apes so hard they turned into human-level intelligences


[summary: We can see natural selection spitting out humans as a special case of "If you dump enough computing power into optimizing a Turing-general policy using a consequentialist fitness criterion, it spits out a new optimizer that probably isn't perfectly aligned with the fitness criterion." After repeatedly optimizing brains to reproduce well, the brains in one case turned into general optimizers in their own right, more powerful ones than the original optimizer, with new goals not aligned in general to replicating DNA.

This could potentially happen anywhere inside an AGI subprocess where you were optimizing inside a sufficiently general solution space and you applied enough optimization power - you could get a solution that did its own, internal optimization, possibly in a way smarter than the original optimizer and misaligned to its original goals.

When heavy optimization pressure on a system crystallizes it into an optimizer - especially one that's powerful, or more powerful than the previous system, or misaligned with the previous system - we could term the crystallized optimizer a "daemon" of the previous system. Thus, under this terminology, humans would be daemons of natural selection.]

If you subject a dynamic system to a large amount of optimization pressure, it can turn into an optimizer or even an intelligence. The classic example would be how natural selection, in the course of extensively optimizing DNA to construct organisms that replicated the DNA, in one case pushed hard enough that the DNA came to specify a cognitive system capable of doing its own consequentialist optimization. Initially, these cognitive optimizers pursued goals that correlated well with natural selection's optimization target of reproductive fitness, which is how these crystallized optimizers had originally come to be selected into existence. However, further optimization of these 'brain' protein chunks caused them to begin to create and share cognitive content among themselves, after which such rapid capability gain occurred that a context change took place and the brains' pursuit of their internal goals no longer correlated reliably with DNA replication.

As much as this was, from a human standpoint, a wonderful thing to have happened, it wasn't such a great thing from the standpoint of inclusive genetic fitness of DNA or just having stable, reliable, well-understood optimization going on. In the case of AGIs deploying powerful internal and external optimization pressures, we'd very much like to not have that optimization deliberately or accidentally crystallize into new modes of optimization, especially if this breaks goal alignment with the previous system or breaks other safety properties. (You might need to stare at the Orthogonality Thesis until it becomes intuitive that, even though crystallizing daemons from natural selection produced creatures that were more humane than natural selection, this doesn't mean that crystallization from an AGI's optimization would have a significant probability of producing something humane.)

When heavy optimization pressure on a system crystallizes it into an optimizer - especially one that's powerful, or more powerful than the previous system, or misaligned with the previous system - we could term the crystallized optimizer a "daemon" of the previous system. Thus, under this terminology, humans would be daemons of natural selection. If an AGI, after heavily optimizing some internal system, was suddenly taken over by an erupting daemon that cognitively wanted to maximize something that had previously correlated with the amount of available RAM, we would say this was a crystallized daemon of whatever kind of optimization that AGI was applying to its internal system.

This presents an AGI safety challenge. In particular, we'd want at least one of the following things to be true anywhere that any kind of optimization pressure was being applied:


Comments

Eli Tyre

When heavy optimization pressure on a system crystallizes it into an optimizer \- especially one that's powerful, or more powerful than the previous system, or misaligned with the previous system \- we could term the crystallized optimizer a "daemon" of the previous system\. Thus, under this terminology, humans would be daemons of natural selection\. If an AGI, after heavily optimizing some internal system, was suddenly taken over by an erupting daemon that cognitively wanted to maximize something that had previously correlated with the amount of available RAM, we would say this was a crystallized daemon of whatever kind of optimization that AGI was applying to its internal system\.

Eli's personal notes:

especially one that's powerful, or more powerful than the previous system

I'm quite interested in how often a crystalized optimizer will be more powerful / more intelligent than the system that gave rise to to it.

I have conflicting intuitions about how frequently that will be the case.