"I wouldn't call this "Chris..."

I wouldn't call this "Christiano's hack." I appreciate the implicit praise that I can think up esoteric failure modes when I feel like it, but I think this issue was clear to many people before I wrote about it. (e.g. I think it was almost certainly clear to Carl, and probably to Wei Dai and some of the other folks on the decision theory list, and presumably to Roko. I always assumed it was clear to you and you just don't like talking about this kind of thing.).

I'd also probably suffer by having my name on it, if the naming was widely known. I endorse thinking about weird failure modes. But I don't think it's the place to focus for now, and I am very sympathetic to AI researchers who think this sort of thing is a distraction at the moment, until we resolve some of the most pressing non-weird failure modes.

Comments

Eliezer Yudkowsky

K, will modify going forward.

Wei Dai

I believe Rolf Nelson first came up with the idea of using simulations to manipulate the most likely environment of an AI, in the context of an FAI possibly hacking a UFAI. He initially posted it on SL4, at http://www.sl4.org/archive/0708/16600.html, then in more detail at http://aibeliefs.blogspot.com/.