Value achievement dilemma

https://arbital.com/p/value_achievement_dilemma

by Eliezer Yudkowsky Mar 27 2015 updated Feb 2 2017

How can Earth-originating intelligent life achieve most of its potential value, whether by AI or otherwise?


[summary: The value achievement dilemma is the general, broad challenge faced by Earth-originating intelligent life in steering our cosmic endowment into a state of high value - successfully turning the stars into a happy civilization.

We face potential existential catastrophes (resulting in our extermination or the corruption of the cosmic endowment) such as sufficiently lethal engineered pandemics, non-value-aligned AIs, or insane smart uploads. A strategy is relevant to value achievement only if success is a game-changer for the overall dilemma humanity faces. E.g., value-aligned powerful AIs or [ intelligence-enhanced humans] both seem to qualify as strategically relevant; but an AI restricted to only prove theorems in Zermelo-Frankel set theory has no obvious game-changing use.]

The value achievement dilemma is a way of framing the AI alignment problem in a larger context. This emphasizes that there might be possible solutions besides AI; and also emphasizes that such solutions must meet a high bar of potency or efficacy in order to resolve our basic dilemmas, the way that a sufficiently value-aligned and cognitively powerful AI could resolve our basic dilemmas. Or at least change the nature of the gameboard, the way that a Task AGI could take actions to prevent destruction by later AGI projects, even if is only narrowly value-aligned and cannot solve the whole problem.

The point of considering posthuman scenarios in the long run, and not just an immediate Task AGI as band-aid, can be seen in the suggestion by Eliezer Yudkowsky [todo: find a citation - CFAI? PtS?] and Nick Bostrom [todo: cite Superintelligence] that we can see Earth-originating intelligent life as having two possible [ stable states], superintelligence and extinction. If intelligent life goes extinct, especially if it drastically damages or destroys the ecosphere in the process, new intelligent life seems unlikely to arise on Earth. If Earth-originating intelligent life becomes superintelligent, it will presumably expand through the universe and stay superintelligent for as long as physically possible. Eventually, our civilization is bound to wander into one of these attractors or another.

Furthermore, by the generic preference stability argument, any sufficiently advanced cognitive agent is very likely to be stable in its motivations or meta-preference framework. So if and when life wanders into the superintelligence attractor, it will either end up in a stable state of e.g. fun-loving or the reflective equilibrium of its creators' civilization and hence achieving lots of value, or a misaligned AI will go on maximizing paperclips forever.

Among the dilemmas we face in getting into the high-value-achieving attractor, rather than the extinction attractor or the equivalence class of paperclip maximizers, are:

Other positive events seem like they could potentially prompt entry into the high-value-achieving superintelligence attractor:

On the other hand, consider someone who proposes that "Rather than building AI, [ we should] build [ Oracle AIs] that just answer questions," and who then, after further exposure to the concept of the [ AI-Box Experiment] and cognitive uncontainability, further narrows their specification to say that an Oracle running in three layers of sandboxed simulation must output only formal proofs of given theorems in Zermelo-Fraenkel set theory, and a heavily sandboxed and provably correct verifier will look over this output proof and signal 1 if it proves the target theorem and 0 otherwise, at some fixed time to avoid timing attacks.

This doesn't resolve the larger value achievement dilemma, because there's no obvious thing we can do with a ZF provability oracle that solves our larger problem. There's no plan such that it would save the world if only we could take some suspected theorems of ZF and know that some of them had formal proofs.

The thrust of considering a larger 'value achievement dilemma' is that while imaginable alternatives to aligned AIs exist, they must pass a double test to be our best alternative:

Any strategy that does not putatively open a clear path to victory if it succeeds, doesn't seem like a plausible policy alternative to trying to solve the AI alignment problem or to doing something else such that success leaves us a clear path to victory. Trying to solve the AI alignment problem is something intended to leave us a clear path to achieving almost all of the achievable value for the future and its astronomical stakes. Anything that doesn't open a clear path to getting there is not an alternative solution for getting there.

For more on this point, see the page on pivotal events.

Subproblems of the larger value achievement dilemma

We can see the place of AI alignment in the larger scheme by considering its parent problem, its sibling problems, and examples of its child problems.