"> So you see the difference..."

https://arbital.com/p/1hg

by Eliezer Yudkowsky Dec 30 2015 updated Dec 30 2015


So you see the difference as whether the programmers have to actually supply the short-term objective, or whether the AI learns the short-term objective they would have defined / which they would accept/prefer?

The distinction seems to buy you relatively little safety at a great cost (basically taking the system from "maybe it's good enough?" to "obviously operating at an incredible disadvantage"). You seem to think that it buys you much more safety than I do.

This statement confuses me. (Remember that you know more about my scenarios than I know about your scenarios, so it will help if you can be more specific and concrete than your first-order intuition claims to be necessary.)

Considering these two scenarios…

…it seems to me that the gap between X and Y very plausibly describes a case where it's much easier to safely build X, though I also reserve some probability mass for the case where almost-all the difficulty of value alignment is in things like reflective stability and "getting the AI to do anything you specify, at all" so that it's only 1% more real difficulty to go from X to Y. I also don't think that X would be at a computational disadvantage compared to Y. X seems to need to solve much fewer of the sort of problems that I think are dangerous and philosophically fraught (though I think we have a core disagreement where you think 'philosophically fraught' is much less dangerous).

I suspect you're parsing up the AI space differently, such that X and Y are not natural clusters to you. Rather than my guessing, do you want to go ahead and state your own parsing?