Actual effectiveness

https://arbital.com/p/actual_effectiveness

by Eliezer Yudkowsky Feb 22 2017

If you want the AI's so-called 'utility function' to actually be steering the AI, you need to think about how it meshes up with beliefs, or what gets output to actions.


For a design feature of a sufficiently advanced AI to be "actually effective", we may need to worry about the behavior of other parts of the system. For example, if you try to declare that a self-modifying AI is not allowed to modify the representation of its utility function, %note: Which shouldn't be necessary in the first place, unless something weird is going on.% this constant section of code may be meaningless unless you're also enforcing some invariant on the probabilities that get multiplied by the utilities and any other element of the AI that can directly poke policies on their way to motor output. Otherwise, the code and representation of the utility function may still be there, but it may not be actually steering the AI the way it used to.