"Six months and several disc..."


by Paul Christiano Dec 28 2015

Six months and several discussions later this still seems like a serious concern (Nick Bostrom seemed to have the same response independently, and to consider it a pretty serious objection).

It really seems like the problem is an artifact of the toy example of diamond-maximization. This "easy" problem is so easy, in a certain sense, that it tempts us to a particular class of simple strategies where we literally specify a model of the world and say what diamond is.

Those strategies seem like an obvious dead end in the real case, and I think everyone is in agreement about that. They also seem like an almost-as-obvious dead end even in the diamond maximization case.

That's fine, but it means that the real justification is quite different from the simple story offered here. Everyone at MIRI I have talked to has fallen back to some variant of this more subtle justification when pressed. I don't know anywhere that the real justification has been fleshed out in any publicly available writing. I would be game for a public discussion about it.

It does seem like there is some real problem about getting agents to actually care about stuff in the real world. This just seems like a very strange way of describing or attacking the problem.