"Act-based is a more general..."


by Paul Christiano Dec 30 2015

Act-based is a more general designation, that includes e.g. imitation learning (and value learning where the agent learns short-term instrumental preferences of the user rather than long-term preferences).

So you see the difference as whether the programmers have to actually supply the short-term objective, or whether the AI learns the short-term objective they would have defined / which they would accept/prefer?

The distinction seems to buy you relatively little safety at a great cost (basically taking the system from "maybe it's good enough?" to "obviously operating at an incredible disadvantage"). You seem to think that it buys you much more safety than I do.

It seems like the main extra risk is from the AI making bad predictions about what the humans would do. Mostly this seems like it will lead to harmless failures if the humans behave responsibly, and it requires only very weak models of human behavior to avoid most of the really bad failures. The main new catastrophic risk I see is the agent thinking it is in a simulation. Are there other similar problems for the act-based approach?

(If we use approval-direction instead of imitation then we may introduce additional concerns depending on how we set it up. But those seem orthogonal to the actual involvement of the human.)