{ localUrl: '../page/act_based_agents.html', arbitalUrl: 'https://arbital.com/p/act_based_agents', rawJsonUrl: '../raw/1w4.json', likeableId: 'DebabrataBhattacharjee', likeableType: 'page', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], pageId: 'act_based_agents', edit: '5', editSummary: '', prevEdit: '4', currentEdit: '5', wasPublished: 'true', type: 'wiki', title: 'Act based agents ', clickbait: '', textLength: '5487', alias: 'act_based_agents', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'PaulChristiano', editCreatedAt: '2016-03-03 03:02:54', pageCreatorId: 'PaulChristiano', pageCreatedAt: '2016-02-04 00:46:03', seeDomainId: '0', editDomainId: '705', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '0', isEditorComment: 'false', isApprovedComment: 'true', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '188', text: 'I’ve recently discussed three kinds of learning systems:\n\n- [Approval-directed agents](https://arbital.com/p/1t7) which take the action the user would most approve of.\n- [Imitation learners](https://arbital.com/p/1vp) which take the action that the user would tell them to take.\n- [Narrow value learners](https://arbital.com/p/1vt) which take the actions that the user would prefer they take.\n\nThese proposals all focus on the short-term instrumental preferences of their users. From the perspective of AI control I think this is the interesting aspect that deserves more attention.\n\nGoing forward I’ll call this kind of approach “act-based” unless I hear something better (credit to Eliezer), and I’ll call agents of this type “act-based agents.”\n\n### Robustness\n\nAct-based agents seem to be robust to certain kinds of errors. You need only the vaguest understanding of humans to guess that killing the user is: (1) not something they would approve of, (2) not something they would do, (3) not in line with their instrumental preferences.\n\nSo in order to get bad outcomes here you have to really mess up your model of what humans want (or more likely mess up the underlying framework in an important way). If we imagine a landscape of possible interpretations of human preferences, there is a “right” interpretation that we are shooting for. But if you start with a wrong answer that is anywhere in the neighborhood, you will do things like “ask the user what to do, and don’t manipulate them.” And these behaviors will eventually get you where you want to go.\n\nThat is to say, the “right” behavior is surrounded by a massive crater of “good enough” behaviors, and in the long-term they all converge to the same place. We just need to land in the crater.\n\n### Human enhancement\n\nAll of these approaches have a common fundamental drawback: they only have as much foresight as the user. In some sense this is why they are robust.\n\nIn order for these systems to behave wisely, the user has to actually _be_ wise. Roughly, the users need to be intellectual peers of the AI systems they are using.\n\nThis may sound quite demanding. But after making a few observations, I think it may be a realistic goal:\n\n- The user can draw upon every technology at their disposal — including other act-based agents. (This is discussed more precisely [here](https://arbital.com/p/1vw) under the heading of “efficacy.”)\n- The user doesn’t need to be quite as smart as the AI systems they are using, they merely need to be within striking distance. For example, it seems fine if it takes a human a few days make a decision, or to understand and evaluate a decision, that an AI can make in a few seconds.\n- The user can delegate this responsibility to other humans whom they are willing to trust (e.g. Google engineers), just like they do today.\n\nIn this story the capabilities of humans grow in parallel with the capabilities of AI systems, driven by close interaction between the two. AI systems do not pursue explicitly defined goals, but instead help the humans do whatever the humans want to do at any given time. The entire process remains necessarily comprehensible to humans — if humans can’t understand how an action helps them achieve their goals, then that action doesn’t get taken.\n\nIn speculations about the long-term future of AI, I think this may be the most common positive vision. But I don’t think there has been much serious thinking about what this situation actually looks like, and certainly not much thinking about how to actually realize such a vision.\n\nNote that the involvement of actual of humans is not intended as a _very_ long-term solution. It’s a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted — until the capability of society is perhaps ten or a hundred times greater than it is today. I don’t think there is a strong case for thinking much further ahead than that.\n\n### What is “narrow” anyway?\n\nThere is clearly a difference between act-based agents and traditional rational agents. But it’s not entirely clear what the key difference is.\n\nConsider a machine choosing a move in a game of chess. I could articulate preferences over that move (castling looks best to me), over its consequences (I don’t want to lose the bishop), over the outcome of the game (I want to win), over immediate consequences of that outcome (I want people to respect my research team), over distant consequences (I want to live a fulfilling life).\n\nWe could also go the other direction and get even narrower: rather than thinking about preferences over moves we can think about preferences over particular steps of the cognitive strategy that produces moves.\n\nAs I advance from “narrow” to “broad” preferences, many things are changing. It’s not really clear what the important differences are, what exactly we mean by “narrow” preferences, at what scales outcomes are robust to errors, at what scales learning is feasible, and so on. I would like to understand the picture better.\n\nThe upshot\n==========\n\nThinking about act-based agents suggests a different (and in my view more optimistic) picture of AI control. There are a number of research problems that are common across act-based approaches, especially related to keeping humans up to speed, and I think that for the moment these are the most promising directions for work on AI control.', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '1', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '2016-02-05 07:32:57', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'false', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'PaulChristiano' ], childIds: [ 'imitation_agent' ], parentIds: [ 'paul_ai_control' ], commentIds: [ '2hr', '89r' ], questionIds: [], tagIds: [], relatedIds: [], markIds: [], explanations: [], learnMore: [], requirements: [], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '12237', pageId: 'act_based_agents', userId: 'EliezerYudkowsky', edit: '5', type: 'newChild', createdAt: '2016-06-09 23:23:07', auxPageId: 'imitation_agent', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8101', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '5', type: 'newEdit', createdAt: '2016-03-03 03:02:54', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '7750', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '4', type: 'newEdit', createdAt: '2016-02-24 22:46:54', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6883', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '3', type: 'newEdit', createdAt: '2016-02-11 21:58:53', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6879', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '2', type: 'newEdit', createdAt: '2016-02-11 21:18:23', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6397', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '1', type: 'newParent', createdAt: '2016-02-04 00:46:41', auxPageId: 'paul_ai_control', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6395', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '0', type: 'deleteParent', createdAt: '2016-02-04 00:46:33', auxPageId: 'unsupervised_learning_ai_control', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6393', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '1', type: 'newEdit', createdAt: '2016-02-04 00:46:03', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6392', pageId: 'act_based_agents', userId: 'JessicaChuan', edit: '0', type: 'newParent', createdAt: '2016-02-04 00:44:37', auxPageId: 'unsupervised_learning_ai_control', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'true', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }