"Consider an AI system compo..."

Consider an AI system composed of many interacting subsystems, or a world containing many AI systems. Are you asking for safety even if one of these systems or subsystems becomes omniscient while others did not? Clearly this would be a nice property to have if it were attainable, but it seems pretty ambitious. I'm also not convinced it's a big deal one way or the other, because I don't expect there to be massive unnoticed (by the AI systems that are designing new AI systems) disparities in power during normal operation. So whether designing for such disparities is useful seems to depend on an empirical claim about the plausibility of big differentials.

You could make your original point with respect to differentials "if it fails for a large enough differential, then why think the real differential is small enough?" but I don't find this very compelling when we can say relatively precisely what kind of differential is small enough.

Comments

Eliezer Yudkowsky

Are you asking for safety even if one of these systems or subsystems becomes omniscient while others did not?

Yes! If your system behaves unsafely when subsystem A becomes too much smarter than subsystem B, that's bad. You should have designed your AI to detect if A gets too far ahead of B, and limit A or suspend to disk or otherwise fail safely.

I've noticed that in a lot of cases, you seem convinced that various classes of problem would be handled… I want to say 'automatically', but I think the more charitable interpretation would be, 'as special cases of solving some larger general problem that I'm not worried about being solved'. Can you state explicitly what background assumption would lead you to think that an AI which behaves badly if subsystem A is very overpowered relative to subsystem B, is still safe? Like, what is the mechanism that makes the AI safe in this case?

Eric Rogstad

Can you state explicitly what background assumption would lead you to think that an AI which behaves badly if subsystem A is very overpowered relative to subsystem B, is still safe?

It seemed to me that Paul was not saying that he thought this scenario would be safe, but that it would be unlikely.