Aligning an AGI adds significant development time

by Eliezer Yudkowsky Feb 22 2017 updated Feb 22 2017

Aligning an advanced AI foreseeably involves extra code and extra testing and not being able to do everything the fastest way, so it takes longer.


The votable proposition is true if, comparing reasonably attainable development paths for…

…where otherwise both projects have access to the same ideas or discoveries in the field of AGI capabilities and similar computation resources; then, as the default / ordinary / modal case after conditioning on all of the said assumptions:

Project Path 1 will require at least 50% longer serial time to complete than Project Path 2, or two years longer, whichever is less.



This page was written to address multiple questioners who seem to have accepted the Orthogonality thesis, but still mostly disbelieve it would take significantly longer to develop aligned AGI than unaligned AGI, if I've understood correctly.

At present this page is an overview of possible places of disagreement, and may later be selectively rather than fully expanded.


Related propositions

Propositions feeding into this one include:

If questioner believes the negation of either of these, it would imply easy specifiability of a decision function suitable for an unlimited superintelligence. That could greatly reduce the need for, e.g:

It's worth checking whether any of these time-costly development principles seem to questioner to not follow as important from the basic idea of value alignment being necessary and not trivially solvable.

Outside view

To the best of my knowledge, it is normal / usual / unsurprising for at least 50% increased development time to be required by strong versus minimal demands on any one of:

%comment: It would indeed be unusual--some project managers might call it extra-ordinary good fortune--if a system demanding two or more of these properties did not require at least 50% more development time compared to a system that didn't.%

Obvious-seeming-to-me analogies include:

Some of the standard ways in which systems with strong versus minimal demands on (3*)-properties *usually* require additional development time:

Outside view on AI problems

Another reference class that feels relevant to me is that things having to do with AI are often more difficult than expected. E.g. the story of computer vision being assigned to 2 undergrads over the summer. This seems like a relevant case in point of "uncorrected intuition has a directional bias in underestimating the amount of work required to implement things having to do with AI, and you should correct that directional bias by revising your estimate upward".

Given a sufficiently advanced Artificial General Intelligence, we might perhaps get narrow problems on the order of computer vision for free. But the whole point of Orthogonality is that you do not get AI alignment for free with general intelligence. Likewise, identifying value-laden concepts or executing value-laden behaviors doesn't come free with identifying natural empirical concepts. We have separate basic AI work to do for alignment. So the analogy to underestimating a narrow AI problem, in the early days before anyone had confronted that problem, still seems relevant.

%comment: I can't see how, after imagining oneself in the shoes of the early researchers tackling computer vision and 'commonsense reasoning' and 'natural-language processing', after the entirety of the history of AI, anyone could reasonably stagger back in shocked and horrified surprise upon encountering the completely unexpected fact of a weird new AI problem being… kinda hard.%

Inside view

While it is possible to build new systems that aren't 100% understood, and have them work, the successful designs were usually greatly overengineered. Some Roman bridges have stayed up two millennia later, which probably wasn't in the design requirements, so in that sense they turned out to be hugely overengineered, but we can't blame them. "What takes good engineering is building bridges that just barely stay up."

If we're trying for an aligned Task AGI without a really deep understanding of how to build exactly the right AGI with no extra parts or extra problems--which must certainly be lacking on any scenario involving relatively short timescales--then we have to do lots of safety things in order to have any chance of surviving, because we don't know in advance which part of the system will nearly fail. We don't know in advance that the O-Rings are the part of the Space Shuttle that's going to suddenly behave unexpectedly, and we can't put in extra effort to armor only that part of the process. We have to overengineer everything to catch the small number of aspects that turn out not to be so "overengineered" after all.

This suggests that even if one doesn't believe my particular laundry list below, whoever walks through this problem, conditional on their eventual survival, will have shown up with some laundry list of precautions, including costly precautions; and they will (correctly) not imagine themselves able to survive based on "minimum necessary" precautions.

Some specific extra time costs that I imagine might be required:

Indepedently of the particular list above, this doesn't feel to me like a case where the conclusion is highly dependent on Eliezer-details. Anyone with a concrete plan for aligning an AI will walk in with a list of plans and methods for safety, some of which require close inspection of parts, and constrain allowable designs, and just plain take more work. One of the important ideas is going to turn out to take 500% more work than required, or solving a deep AI problem, and this isn't going to shock them either.

Meta view

I genuinely have some trouble imagining what objection is standing in the way of accepting "ceteris paribus, alignment takes at least 50% more time", having granted Orthogonality and alignment not being completely trivial. I did not expect the argument to bog down at this particular step. I wonder if I'm missing some basic premise or misunderstanding questioner's entire thesis.

If I'm not misunderstanding, or if I consider the thesis as-my-ears-heard-it at face value, then I can only imagine the judgment "alignment probably doesn't take that much longer" being produced by ignoring what I consider to be basic principles of cognitive realism. Despite the dangers of psychologizing, for purposes of oversharing, I'm going to say what feels to me like it would need to be missing:

AI alignment could be easy in theory and still take 50% more development time in practice. That is a very ordinary thing to have happen when somebody asks the project manager to make sure a piece of highly novel software actually implements an "easy" property the first time the software is run under new conditions that can't be fully tested in advance.

"At least 50% more development time for the aligned AI project, versus the corner-cutting project, assuming both projects otherwise have access to the same stock of ideas and methods and computational resources" seems to me like an extremely probable and normal working premise to adopt. What am I missing?

%comment: I have a sense of "Why am I not up fifty points in the polls?" and "What experienced software manager on the face of the Earth (assuming they didn't go mentally haywire on hearing the words 'Artificial Intelligence', and considered this question as if it were engineering), even if they knew almost nothing else about AI alignment theory, would not be giving a rather skeptical look to the notion that carefully crafting a partially superhuman intelligence to be safe and robust would only take 1.5 times as long compared to cutting all the corners?" %