[summary: Arguendo, when some particular proposed AI safety measures is alleged to be inherently opposed to the useful work the AI is meant to do.

We could use the metaphor of a scissors and its dangerous blades. We can have a "safety scissors" that is only *just* sharp enough to cut paper, but this is still sharp enough to do some damage if you work at it. If you make the scissors *even safer* by encasing the dangerous blades in foam rubber, the scissors can't cut paper any more; and if it *can* cut paper, it's still unsafe. Maybe you can cut clay, but nobody knows how to do a sufficiently large amount of good by cutting clay.

Similarly, there's an obvious way to cut down the output of an Oracle AGI to the point where all it can do is tell us that a proposed theorem is provable from the axioms of Zermelo-Fraenkel set theory. Unfortunately, nobody knows how to use a ZF provability oracle to save the world.]

"This type of safety implies uselessness" (or conversely, "any AI powerful enough to be useful will still be unsafe") is an accusation leveled against a proposed AI safety measure that must, to make the AI safe, be enforced to the point that it will make the AI useless.

For a non-AI metaphor, consider a scissors and its dangerous blades. We can have a "safety scissors" that is only *just* sharp enough to cut paper - but this is still sharp enough to do some damage if you work at it. If you try to make the scissors *even safer* by encasing the dangerous blades in foam rubber, the scissors can't cut paper any more. If the scissors *can* cut paper, it's still unsafe. Maybe you could in principle cut clay with a scissors like that, but this is no defense unless you can tell us something very useful that can be done by cutting clay.

Similarly, there's an obvious way to try cutting down the allowed output of an Oracle AGI to the point where all it can do is tell us that a given theorem is provable from the axioms of Zermelo-Fraenkel set theory. This might prevent the AGI from hacking the human operators into letting it out, since all that can leave the box is a single yes-or-no bit, sent at some particular time. An untrusted superintelligence inside this scheme would have the option of strategically not telling us when a theorem *is* provable in ZF; but if the bit from the proof-verifier said that the input theorem was ZF-provable, we could very likely trust that.

But now we run up against the problem that nobody knows how to actually save the world by virtue of sometimes knowing for sure that a theorem is provable in ZF. The scissors has been blunted to where it's probably completely safe, but can only cut clay; and nobody knows how to do *enough* good by cutting clay.

# Ideal models of "safe but useless" agents

Should you have cause to do a mathematical study of this issue, then an excellent ideal model of a safe but useless agent, embodying maximal safety and minimum usefulness, would be a rock.