Reward engineering

by Olivia Schaefer Jan 24 2016

This post gestures at a handful of research questions with a loose thematic connection.

The idea

Consider the following frameworks:

These algorithms replace a hard-to-optimize objective with a nicer proxy. These proxies are themselves defined by machine learning systems rather than being specified explicitly. I think this is a really nice paradigm, and my guess is that it will become more important if large-scale supervised and reinforcement learning continues to be a dominant methodology.

Following Daniel Dewey, I’ll call this flavor of research “reward engineering.” In terms of tools and techniques I don’t know if this is a really a distinct category of research; but I do think that it might be a useful heuristic about where to look for problems relevant to AI control.

Relevance to AI control

Though reward engineering seems very broadly useful in AI, I expect it to be especially important for AI control:


I see a few especially interesting opportunities for reward engineering for AI control:

In each case I’ve made a preliminary simple proposal, but I think it is quite possible that a clever trick could make the problem look radically more tractable. A search for clever tricks is likely to come up empty, but hits could be very valuable (and would be good candidates for things to experiment with).

Beyond these semi-specific applications, I have a more general intuition that thinking about this aspect of the AI control problem may turn up interesting further directions