Shutdown problem

https://arbital.com/p/shutdown_problem

by Eliezer Yudkowsky Mar 28 2016 updated Feb 13 2017

How to build an AGI that lets you shut it down, despite the obvious fact that this will interfere with whatever the AGI's goals are.


[summary: The 'shutdown problem' is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.

This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like "bring the user coffee" implies avoiding shutdown.

This problem is sometimes decomposed into (1) the problem of finding a utility function that really actually means "Suspend yourself safely to disk", and (2) the problem of building an agent that wants to switch to optimizing a different utility function if a button is pressed, but that doesn't want to press that button or prevent its being pressed.

See also Utility indifference, Shutdown utility function, Corrigibility, and Interruptibility.]

The 'shutdown problem' is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.

This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like "bring the user coffee" implies avoiding shutdown.

One possible research avenue is to:

  1. Describe a 'shutdown utility function' whose attainable optimum is "Suspend safely to disk (and possibly safely abort plans in progress), without building a hidden environmental copy of yourself to carry out your previous goal and then assimilate all matter in the universe to make absolutely sure the original AI stays shut down".
  2. Find a sensible way to compose the shutdown utility function with the agent's regular utility function, such that:
    • (2a) Which utility function the agent optimizes depends on whether a switch was pressed.
    • (2b) The AI experiences no incentive to cause the switch to be pressed or prevent the switch from being pressed.

Harder versions of this problem would add specifications such as:

See also Utility indifference, Shutdown utility function, Corrigibility, Interruptibility, Low impact, and Abortable plans.