"> Here is my understanding ..."

https://arbital.com/p/1h6

by Eliezer Yudkowsky Dec 30 2015 updated Jan 4 2016


Here is my understanding of Eliezer's picture (translated into my worldview): we might be able to build AI systems that are extremely good at helping us build capable AI systems, but not nearly as good at helping us solve AI alignment/control or building alignable/controllable AI.

This indeed is the class of worrisome scenarios, and one should consider that (a) Eliezer thinks that aligning the rocket is harder than fueling it in general, and (b) that this was certainly true of e.g. Eurisko which was able to get some amount of self-improvement but with all control issues being kicked squarely back to Douglas Lenat. We can also see natural selection's creation of humans in the same light, etcetera. On my view it seems extremely probable that, whatever we have in the way of AI algorithms (short of full FAI) creating other AI algorithms, they'll be helping out not at all with alignment and control and things like reflective stability and so on.

The case where KANSI becomes important is where we get to the level where AGI becomes possible, at a point where there are not huge foregone advantages from whatever types of AI creation of AI algorithms of a type where existing transparency or control work doesn't generalize. You can define a neural network undergoing gradient descent as "improving itself" but relative to current systems this doesn't change the algorithm to the point where we no longer understand what's going on. KANSI is relevant in the scenario where we first reach possible-advanced-AGI levels at a point where an organization with lots of resources and maybe a realistically-sized algorithmic lead, that foregoes the class of AI-improving-AI benefits that would make important subprocesses very hard to understand, is not at a disadvantage relative to a medium-sized organization with fewer resources. This is the level where we can put a big thing together out of things vaguely analogous to deep belief networks or whatever, and just run our current algorithms or minor variations on them, and have the AI's representation be reasonably transparent and known so that we can monitor the AI's thoughts - without some huge amount of work having gone into making transparency reflectively stable and corrigible through self-improvement or getting the AI to help us out with that, etcetera, because we're just taking known algorithms and running on them on a vast amount of computing power.