"It seems unlikely we'll eve..."


by Paul Christiano Dec 29 2015

It seems unlikely we'll ever build systems that "maximize X, but rule out some bad solutions with the ad hoc penalty term Y," because that looks totally doomed. If you want to maximize something that can't be explicitly defined, it looks like you have to build a system that doesn't maximize something which is explicitly defined. (This is an even broader point---"do X but not Y" is just one kind of ad hoc proxy for our values, and the broader point is that ad hoc proxies to what we really care about just don't seem very promising.)

In some sense this is merely strong agreement with the basic view behind this post. I'm not sure if there is any real disagreement.