"It's not obvious to me that..."


by Eliezer Yudkowsky Mar 16 2016

It's not obvious to me that these two approaches mean the same thing. Let's say that an AI sees some stale burritos and some fresh burritos, with the former being classified as negative examples and the latter being specified as positive examples. If you use the simplest but not conservative concept that classifies the training data, maybe you max out the probability that something will be classified as a burrito by eliminating every trace of staleness… or moving even further along some dimension that distinguishes stale from fresh burritos.

Now, it's possible that this would be fixed automatically by having a mixture of hypotheses about what might underlie the good-burrito classification and that one of the hypotheses would be "maybe a burrito can't be too fresh", but again, this is not obvious to me.

It seems to me that, in general, when we learn a mixture of the simplest concepts that might assign probabilities well over previously labeled classifications, we might still be ending up with something with a nonconservative maximum. Maybe the AI learns to model the human system for classifying burritos and then presents us with a weird object whose appearance hacks us to suddenly be absolutely certain that it is a burrito - this is just me trying to wave my hands in the direction of what seems like it might be an underlying difference between "learn a probabilistic classification rule and max it out" and "try to draw a simple concept that is conservatively narrow".

It might be the case that given sufficient imagination to consider many possible hypotheses, trying to fit all of those hypotheses well (which might not be the same as maxing out the mixture) is an implementation of conservatism, or even that just trying to max out the mixture turns out to implement conservatism in practice. But then it might also be the case that in the not-far-superhuman regime, taking a direct approach to making merely powerful learning systems be 'conservative' rather than 'max out the probability considering many hypotheses' would be more tractable or straightforward as an engineering problem.