"I think we're going to have..."

https://arbital.com/p/39t

by Eliezer Yudkowsky Apr 21 2016 updated Apr 21 2016


I think we're going to have to specialize the terminology so we have separate words for "learn any goal concept" and "learn human normativity" instead of calling these both "value", which is something I'm currently trying to think how to revise. But if by value learning you mean "outcome-preference-criterion learning" and not value learning, then yes, we're looking for outcome-preference-criterion learning where the criterion seems simple to us, is hopefully local, and is philosophically unproblematic by our own standards. Like, say, having the outcome be one in which we just have a damn strawberry.

On this definition, what is the difference between "communicating a goal concept" and "communicating a goal"?

In the language being used here, it sounds to me like "communicating a goal" should parse to "communicating a goal concept to an agent which will then optimize for the outcome-preference-criterion you're about to communicate to it."