Nonperson predicate

by Eliezer Yudkowsky Dec 28 2015

If we knew which computations were definitely not people, we could tell AIs which programs they were definitely allowed to compute.

A "nonperson predicate" is a possible method for preventing an advanced AI from accidentally running sapient computations (it would be a potentially huge moral catastrophe if an AI created, ran, and discarded a large number of sapient programs inside itself). A nonperson predicate looks at potential computations and returns one of two possible answers, "Don't know" and "Definitely not a person". A successful nonperson predicate may (very often) return "Don't know" for computations that aren't in fact people, but it never returns "Definitely not a person" for something that is a person. In other words, to solve this problem, we don't need to know what consciousness is so much as we need to know what it isn't - we don't need to be sure what is a person, we need to be sure what isn't a person. For a nonperson predicate to be useful, however, it must still pass enough useful computations that we can build a working, capable AI out of them. (Otherwise "Rocks are okay, everything else might be a person" would be an adequate nonperson predicate.) The foreseeable difficulty of a nonperson predicate is that instrumental pressures to model humans accurately might tend to seek out flaws and loopholes in any attempted predicate. See the page on Mindcrime for more detail.