Underestimating complexity of value because goodness feels like a simple property

https://arbital.com/p/underestimate_value_complexity_perceputal_property

by Eliezer Yudkowsky Jun 27 2016

When you just want to yell at the AI, "Just do normal high-value X, dammit, not weird low-value X!" and that 'high versus low value' boundary is way more complicated than your brain wants to think.


One potential reason why people might tend to systematically underestimate the complexity of value is if the "goodness" of a policy or goal-instantiation feels like a simple, direct property. That is, our brains compute the goodness level and make it available to us as a relatively simple quantity, so we feel like it's a simple fact that tiling the universe with tiny agents experiencing maximum simply-represented 'pleasure' levels, is a bad version of happiness. We feel like it ought to be simple to yell at an AI "Just give me high-value happiness, not this weird low-value happiness!" Or have the AI learn, from a few examples, that it's meant to produce high-value X and not low-value X, especially if the AI is smart enough to learn other simple boundaries, like the difference between red objects and blue objects. Where actually the boundary between "good X" and "bad X" is value-laden and far more wiggly and would require far more examples to delineate. What our brain computes as a seemingly simple, perceptually available one-dimensional quantity, does not always correspond to a simple, easy-to-learn gradient in the space of policies or outcomes. This is especially true of the seemingly readily-available property of beneficialness.