"Thanks for this analysis and congratulations on..."

https://arbital.com/p/8ry

by Toby Ord Oct 26 2017


Thanks for this analysis and congratulations on its clarity.

One important point is that that when considering:

$~$\pi_5$~$: Shut down and let the humans optimize whatever $~$V$~$ they have in the actual world

the text makes it look like the outcome involves unassisted humans optimising $~$V$~$. But the real option is that humans would almost certainly be trying to optimise $~$V$~$ with future AI assistance, quite possibly by an improved, restarted version of the AI itself. I think this helps make it a lot more plausible that this is the best policy for the AI to choose (both intuitively and from the AI's perspective), albeit it introduces some familiar difficulties in analysing whether to trust new versions of yourself to do a better job.

I think that the paragraph about 'why wouldn't the AI just update on the evidence that the human tried to shut it down and then carry on?' is a key point and could be developed.

Another key point is that we might legitimately want a button that just turns the AI off (and where the AI doesn't prevent this), rather than a button where the AI decides whether to let us turn it off. On this line, it would be better than nothing to have an AI that typically lets us turn it off, but even better to (also?) just have a way of taking this decision out of its hands.