"Your characterization of ut..."


by Paul Christiano Nov 19 2015

Your characterization of utility indifference doesn't seem quite right. More accurate would be: the agent behaves as if it were certain the shutdown button won't do anything (because e.g. it is confident that a particular quantum coin will come up heads), and so won't bother to either eliminate or preserve it.

When presenting this problem, it seems best to lead with the underlying intuition about self-doubt, since I think that seems more interesting than the narrower applications (e.g. shutdown button). The narrower applications nicely show that self-doubt has clear meaningful consequences.