"Eliezer, I find your positi..."


by Paul Christiano Dec 29 2015 updated Dec 29 2015

Eliezer, I find your position confusing.

Consider the first AI system that can reasonably predict your answers to questions of the form "Might X constitute mindcrime?" where X is a natural language description of some computational process. (Well enough that, say, most of a useful computation can be flagged as "definitely not mindcrime," and all mindcrime can be flagged as "maybe mindcrime.")

Do you believe that this system will have significant moral disvalue? If that system doesn't have moral disvalue, where is the chicken and egg problem?

So it seems like you must believe that this system will have significant moral disvalue. That sounds implausible on its face to me. What are you imagining this system will look like? Do you think that this kind of question is radically harder than other superficially comparable question-answering tasks? Do you think that any AI researchers will find your position plausible? If not, what do you think they are getting wrong?

ETA: maybe the most useful thing to clarify would be the kind of computation, and how it relates to the rest of what the AI is doing, that you would find really hard to classify, but which might plausibly be unavoidable for effective computation.

This whole disagreement may be related to broader disagreements about how aligned AI systems will look. But you seem to think that mindcrime is also a problem for act-based agents, so that can't explain all of it. We might want to restrict attention to the act-based case in order to isolate disagreement specific to mindcrime, and it's possible that discussion should wait until we get on the same page about act-based agents.