"As I see it, there are two ..."


by Paul Christiano Jun 18 2015

As I see it, there are two cases that are meaningfully distinct:

(1) what we want is so simple, and we are so confident in what it is, that we are prepared to irrevocably commit to a particular concrete specification of "what we want" in the near future, (of course it's also fine to have a good enough approximation with high enough probability, etc. etc.)

(2) it's not, or we aren't

It is more or less obvious that we are in (2). For example, even if every human was certain that the only thing they wanted was to produce as much diamond as possible (to use your example), we'd still be deep into case (2). And that's just about the easiest imaginable case. (The only exception I can see is some sort of extropian complexity-maximizing view.)

Are there meaningful policy differences between different shades of case (2)? I'm not yet convinced.


Eliezer Yudkowsky

Are there meaningful policy differences between different shades of case (2)?

If all of our uncertainty was about the best long-term destiny of humanity, and there were simple and robust ways to discriminate good outcomes from catastrophic outcomes when it came to asking a behaviorist genie to do simple-seeming things, then building a behaviorist genie would avert Edge Instantiation, Unforeseen Maximums, and all the other value identification problems. If we still have a thorny value identification problem even for questions like "How do we get the AI to just paint all the cars pink, without tiling the galaxies with pink cars?" or "How can we safely tell the AI to 'pause' when somebody hits the pause button?", then there are still whole hosts of questions that remain relevant even if somebody 'just' wants to build a behaviorist genie.