"I was comparing act-based a..."


by Paul Christiano Dec 30 2015

I was comparing act-based agents to what you are calling a genie. Both get objectives from humans and human preferences about how to carry out short-term projects (e.g. including conservatism). The genie is getting short-term objectives by literally asking humans. The act-based agent is basically getting objectives by predicting what a human would say if asked. It seems like the only advantage of the genie is that it doesn't make prediction errors about humans.

If you want to make the comparison as clear as possible, we can turn a proposed genie into the most-similar-possible act-based agent. This agent calls up a human with small probability and gets an instruction which it executes. If it doesn't call a human, it guesses what instruction a human would give if called, and then executes that. (Note that the executing the given instruction may require asking questions of the user, and that the user needs to behave slightly differently when giving instructions to this kind of modified genie.)

The genie seems to be at a big disadvantage: it requires human involvement in every medium- or long-term decision, which will rapidly become impractical. This is especially bad when making medium- or long-term decisions itself requires consulting AI systems which themselves requires humans to make medium- or long-term decisions… Rather than say a 100x increase in human effort, actually providing feedback can result in exponentially large increases in required effort.

One reason that the act-based approach seems clearly preferable to me is that I don't imagine you can really carry out instructions without being able to make similarly good predictions about the user. You seem to be imagining a direct way to formulate an imperative like "do no harm" that doesn't involve predicting what the user would describe as a harm or what harm-avoidance strategy the user would advocate; I don't see much hope for that.