Mindcrime: Introduction

The more predictive accuracy we want from a model, the more detailed the model becomes. A very rough model of an airplane might only contain the approximate shape, the power of the engines, and the mass of the airplane. A model good enough for engineering needs to be detailed enough to simulate the flow of air over the wings, the centripetal force on the fan blades, and more. As a model can predict the airplane in more and more fine detail and with better and better probability distributions, the computations carried out to make the model's predictions may start to look more and more like a detail simulation of the airplane flying.

Consider a machine intelligence building, and testing, the best models it can manage of a human being's behavior. If the model that produces the best predictions involves simulations with moderate degrees of isomorphism to human cognition, then the model, as it runs, may itself be self-aware or conscious or sapient or whatever other property stands in for being an object of ethical concern. This doesn't mean that the running model of Fred is Fred, or even that the running model of Fred is human. The concern is that a sufficiently advanced model of a person will be a person, even if they might not be the same person.

We might then worry that, for example, if Fred is unhappy, or might be unhappy, the agent will consider thousands or millions of hypotheses about versions of Fred. Hypotheses about suffering versions of Fred, when run, might themselves be suffering. As a similar concern, these hypotheses about Fred might then be discarded - cease to be run - if the agent sees new evidence and updates its model. Since programs can be people, stopping and erasing a conscious program is the crime of murder.

This scenario, which we might call 'the problem of sapient models', is a subscenario of the general problem of what Bostrom terms 'mindcrime'. (Eliezer Yudkowsky has suggested 'mindgenocide' as a term with fewer Orwellian connotations.) More generally, we might worry that there are agent systems that do huge amounts of moral harm just in virtue of the way they compute, by containing embedded conscious suffering and death.

Another scenario might be called 'the problem of sapient subsystems'. It's possible that, for example, the most efficient possible system for, e.g., allocating memory to subprocesses, is a memory-allocating-subagent that is reflective enough to be an independently conscious person. This is distinguished from the problem of creating a single machine intelligence that is conscious and suffering, because the conscious agent might be hidden at a lower level of a design, and there might be a lot more of them than just one suffering superagent.

Both of these scenarios constitute moral harm done inside the agent's computations, irrespective of its external behavior. We can't conclude that we've done no harm by building a superintelligence, just in virtue of the fact that the superintelligence doesn't outwardly kill anyone. There could be trillions of people suffering and dying inside the superintelligence. This sets mindcrime apart from almost all other concerns within the Value alignment problem, which usually revolve around external behavior.

To avoid mindgenocide, it would be very handy to know exactly which computations are or are not conscious, sapient, or otherwise objects of ethical concern. Or, indeed, to know that any particular class of computations are not objects of ethical concern.

Yudkowsky calls a [ nonperson predicate] any computable test we could safely use to determine that a computation is definitely not a person. This test only needs two possible answers, "Not a person" and "Don't know". It's fine if the test says "Don't know" on some nonperson computations, so long as the test says "Don't know" on all people and never says "Not a person" when the computation is conscious after all. Since the test only definitely tells us about nonpersonhood, rather than detecting personhood in any positive sense, we can call it a nonperson predicate.

However, the goal is not just to have any nonperson predicate - the predicate that only says "known nonperson" for the empty computation and no others meets this test. The goal is to have a nonperson predicate that includes powerful, useful computations. We want to be able to build an AI that is not a person, and let that AI build subprocesses that we know will not be people, and let that AI improve its models of environmental humans using hypotheses that we know are not people. This means the nonperson predicate does need to pass some AI designs, cognitive subprocess designs, and human models that are good enough for whatever it is we want the AI to do.

This seems like it might be very hard for several reasons:

There is unusually extreme philosophical dispute, and confusion, about exactly which programs are and are not conscious or otherwise objects of ethical value. (It might not be exaggerating to scream "nobody knows what the hell is going on".)
We can't fully pass any class of programs that's [ Turing-complete]. We can't say once and for all that it's safe to model gravitational interactions in a solar system, if enormous gravitational systems could encode computers that encode people.
The Nearest unblocked strategy problem applies to any attempt to forbid an advanced consequentialist agent from using the most effective or obvious ways of modeling humans. The next best way of modeling humans, outside the blocked-off options, is unusually likely to look like a weird loophole that turns out to encode sapience some way we didn't imagine.

An alternative for preventing mindcrime without a trustworthy [ nonperson predicate] is to consider agent designs intended not to model humans, or other minds, in great detail, since there may be some pivotal achievements that can be accomplished without a value-aligned agent modeling human minds in detail.