"> In principle, a nonperson predicate needs onl..."

In principle, a nonperson predicate needs only two possible outputs, "Don't know" and "Definitely not a person". It's acceptable for many actually-nonperson programs to be labeled "don't know", so long as no people are labeled "definitely not a person". […] The implicit difficulty is that the nonperson predicate must also pass some programs of high complexity that do things like "acceptably model humans" or "acceptably model future versions of the AI".

There's another difficulty: the nonperson predicate must not itself commit mindcrime while evaluating the programs. This sounds obvious enough in retrospect that it doesn't feel worth mentioning, but it took me a while to notice it.

Obviously, if you're running the program to determine if it's a person by analyzing its behavior (e.g. by asking it if it feels like it's conscious), you already commited mindcrime by the time you return "Don't know".

But if the tested program and the predicate are complex enough, lots of analysis other than straight running the program could accidentally instantiate persons as sub-processes, potentially ones distinct from those that might be instantiated by the tested program itself.

In other words: Assume Π is the set of all programs that potentially contain a person, i.e. for any program π, π in Π iff running π could instantiate a person.

We want a computable safety predicate S such that {S(π): π is a program} implies π ∉ Π, i.e. S(π) means π is safe. (Though !S(π) does not necessarily imply π ∈ Π.)

The problem is that S(π) is also a program, and we need to make sure that S(π) ∉ Π before running it. We can't use S(S(π)) to check, because we'd need to check first that S(S(π)) ∉ Π…

(Note that a program that implements a sufficietly complex safety predicate S, when executed with another program π as input, might instantiate a person even if just running π directly would not!)