"> on my view it seems extre..."

https://arbital.com/p/1jl

by Paul Christiano Jan 4 2016


on my view it seems extremely probable that, whatever we have in the way of AI algorithms short of full FAI creating other AI algorithms, they'll be helping out not at all with alignment and control

You often say this, but I'm obviously not yet convinced.

As I see it the biggest likely gap is that you can empirically validate work in AI, but maybe cannot validate work on alignment/control except by consulting a human. This is problematic if either human feedback ends up being a major cost/obstacle (e.g. because AI systems are extremely cheap/fast, or because they are too far beyond humans for humans to provide meaningful oversight), or if task definitions that involve human feedback end up being harder by virtue of being mushier goals that don't line up as well with the actual structure of reality.

These objections are more plausible for establishing that control work is a comparative advantage of humans. In that context I would accept them as plausible arguments, though I think there is a pretty good chance of working around them.

But those considerations don't seem to imply that AI will help out "not at all." It seems pretty plausible that you are drawing on some other intuitions that I haven't considered.

Another possible gap is that control may just be harder than capabilities. But in that case the development of AI wouldn't really change the game, it would just make the game go faster, so this doesn't seem relevant to the present discussion. (If humans can solve the control problem anyway, humans+AI systems would have a comparable chance.)

Another possible gap is that there are many more iterations of AI design, and a failure at any time cascades into future iterations. I've pointed out that there can't be many big productivity improvements before any earlier thinking about AI is thoroughly obsolete, but I'm certainly willing to grant that forcing control to keep up for a while does make the problem materially harder (moreso the more that our solutions to the control problem are closely tied to details of the AI systems we are building). I agree that sticking with the same AI designs for longer can in some respects make the control problem easier. But it seems like you are talking about a difference-in-kind for safety work, rather than another way to slightly improve safety at the expense of efficacy.

Note: I'm saying that if you can solve the AI control/alignment problem for the AI systems in year N, then the involvement of those AI systems in subsequent AI design doesn't exert a significant additional pressure that makes it harder to solve the control/alignment problem in year N+1. It seems like this is the relevant question in the context of the OP.