"On the act-based model, the..."

On the act-based model, the user would say something like "paint all the cars pink," and the AI would take this as evidence about what individual steps the user would approve of. Effectiveness at painting all cars pink is one consideration that the user would use. Most of the problems on your list are other considerations that would affect the user's judgment.

The difference between us seems to be something like: I feel it is best to address almost all of these problems by using learning, and so I am trying to reduce them to a traditional learning problem. For example, I would like a human to reject plans that have huge side effects, and for the agent to learn that big side effects should be avoided. You don't expect that it will be easy to learn to address these problems, and so think that we should solve them ourselves to make sure they really get solved. (I think you called my position optimism about "special case sense.")

I might endorse something like your approach at some stage---once we have set everything up as a learning problem, we can ask what parts of the learning problem are likely to be especially difficult+important, and focus our efforts on making sure that systems can solve those problems (which may involve solving them ourselves, or may just involve differential ML progress). But it seems weird to me to start this way.

Some considerations that seem relevant to me:

To the extent we can set up all of these problems as parts of a learning problem, it just seems like an empirical question which ones will be hard, and how hard they will be. I think that you are wrong about this empirical question, and you think I am wrong, but perhaps we can agree that it is an empirical question?
Setting things up as a learning problem is not only helpful for AI systems. It also automatically turns nebulous philosophical issues into precise technical problems, since they now correspond to e.g. receiving higher reward in some reinforcement learning environment.
In terms of comparative-advantage-across-time, it seems better for us to identify anything that can't be addressed by learning, and will require e.g. philosophical labor, and to postpone problems that might be addressed by learning or clever algorithms (since in the future people will have access to more powerful learning systems and cleverer algorithms)
The historical track record for hand-coding vs. learning is not good. For example, even probabilistic reasoning seems at this point like it's something that our agents should learn on their own (to the extent that probability is relevant to ML, it is increasingly as a technique relevant to analyzing ML systems rather than as a hard-coded feature of their reasoning). So it seems natural to first make sure that everything can be attacked as a learning problem, before trying to solve a bunch of particular learning problems by hand.

It's possible that the difference between us is that I think it is feasible to reduce almost all of these problems to traditional learning problems, where you disagree. But when we've actually talked about it, you seem to have consistently opted for positions like "in some sense this is 'just' a prediction problem, but I suspect that solving it will require us to understand X." And concretely, it seems to me like we have an extremely promising approach for reducing most of these problems to learning problems.

Comments

Eliezer Yudkowsky

To the extent we can set up all of these problems as parts of a learning problem, it just seems like an empirical question which ones will be hard, and how hard they will be. I think that you are wrong about this empirical question, and you think I am wrong, but perhaps we can agree that it is an empirical question?

The main thing I'd be nervous about is having the difference in our opinions be testable before the mission-critical stage. Like, maybe simple learning systems exhibit pathologies and you're like "Oh that'll be fixed with sufficient predictive power" and I say "Even if you're right, I'm not sure the world doesn't end before then." Or conversely, maybe toy models seem to learn the concept perfectly and I'm like "That's because you're using a test set that's an identical set of problems to the training set" and you're like "That's a pretty good model for how I think superhuman intelligence would also go, because it would be able to generalize better over the greater differences" and I'm like "But you're not testing the mission-critical part of the assumption."

The historical track record for hand-coding vs. learning is not good. For example, even probabilistic reasoning seems at this point like it's something that our agents should learn on their own (to the extent that probability is relevant to ML, it is increasingly as a technique relevant to analyzing ML systems rather than as a hard-coded feature of their reasoning).

We might have an empirical disagreement about to what extent theory plays a role in practice in ML, but I suspect we also have a policy disagreement about how important transparency is in practice to success - i.e., how likely we are to die like squirrels if we try to use a system whose desired/required dynamics we don't understand on an abstract level.

So it seems natural to first make sure that everything can be attacked as a learning problem, before trying to solve a bunch of particular learning problems by hand.

I'm not against trying both approaches in parallel.