"Paul, I don't disagree that..."

Paul, I don't disagree that we want the AI to think whatever thought it ought to think. I'm proposing a chicken-and-egg problem where the AI can't figure out which thoughts constitute mindcrime, without already committing mindcrime. I think you could record a lot of English pontification from me and still have a non-person-simulating AI feeling pretty confused about what the heck I meant or how to apply it to computer programs. Can you give a less abstract view of how you think this problem should be solved? What human-understanding and mindcrime-detection abilities do you think the AI can develop, in what order, without committing lots of mindcrime along the way? Sure, given infinite human understanding, the AI can detect mindcrime very efficiently, but the essence of the problem is that it seems hard to get infinite human understanding without lots of mindcrime being committed along the way. So what is it you think can be done instead, that postulates only a level of human understanding that you think can be done knowably without simulating people?