by Ryan Carey Apr 24 2017

Interesting question.

Here's how this problem is motivated in my head… The more obvious way to get an AI system to shut down is to have a shutdown action. Then utility-maximization occurs in an inner loop that is overridden by instructions to shutdown or change the value function. But then you need the utility-maximizer to be corrigible somehow, perhaps using a shutdown utility function, making this a purported subproblem of corrigibility.

As for obvious proposed solutions, if you had defined a shutdown action [e.g. run this routine that switches the power off], then you could have the objective "The chance of this action being performed is greater than 99.999%" as your utility function. Though an incorrigible AI might be able to copy itself to get around this…

One also wonders if this could be adapted into a reductio ad absurdum of the idea of making aligned AI by specifying a sovereign's utility function.