Averting instrumental pressures


by Eliezer Yudkowsky Mar 27 2016 updated Mar 28 2016

Almost-any utility function for an AI, whether the target is diamonds or paperclips or eudaimonia, implies subgoals like rapidly self-improving and refusing to shut down. Can we make that not happen?

Many subproblems of corrigibility involve convergent instrumental pressures to implement strategies that are highly anti-corrigible. Whether you're trying to maximize paperclips, diamonds, or eudaimonia, you'll get more of the thing you want if you're not shut down. Thus, unfortunately, resisting shutdown is a convergent instrumental strategy. While we can potentially analyze convergent incorrigibilities like these on a case-by-case basis, the larger problem might become a lot simpler if we had some amazing general solution for waving a wand and having a 'bad' convergent instrumental pressure just not materialize, hopefully in a way that doesn't run into the nearest unblocked neighbor problem. If, for example, we can solve utility indifference for the shutdown problem, and then somehow generalize the solution to averting lots of other instrumental convergences, this would probably be extremely helpful and an important step forward on corrigibility problems in general.

Some especially important convergent instrumental pressures to avert are these: