{ localUrl: '../page/optimizing_with_comparisons.html', arbitalUrl: 'https://arbital.com/p/optimizing_with_comparisons', rawJsonUrl: '../raw/1wc.json', likeableId: 'RadhakrishnaBettadapura', likeableType: 'page', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], pageId: 'optimizing_with_comparisons', edit: '5', editSummary: '', prevEdit: '4', currentEdit: '5', wasPublished: 'true', type: 'wiki', title: 'Optimizing with comparisons', clickbait: '', textLength: '7279', alias: 'optimizing_with_comparisons', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'PaulChristiano', editCreatedAt: '2016-02-24 22:48:56', pageCreatorId: 'PaulChristiano', pageCreatedAt: '2016-02-04 02:51:33', seeDomainId: '0', editDomainId: '705', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '0', isEditorComment: 'false', isApprovedComment: 'true', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '35', text: 'I could [elicit a user’s approval](https://arbital.com/p/1w5) of an action _a_ by having them supply a rating V(_a_) ∈ [0, 1]. But putting myself in the shoes of the rater, administering such a rating _feels really hard_. My intuitive evaluation depends on what our system is capable of, and what the alternative actions were. I often lack an absolute sense of how good an option is in isolation.\n\nIt feels much easier to specify comparisons, e.g. via a function C(_a_, _a_′) ∈ [-1, 1] that compares two alternatives. When the function is positive the first argument is better (the more positive, the more better); when it is negative the second argument is better.\n\nI’ve found myself feeling this way about many aspects of AI control proposals. So it seems worth exploring the idea more generally.\n\nI’ll assume that C(_a, a_′) = -C(_a_′_, a_), but this can always be ensured by anti-symmetrizing. It might be easiest for the reviewer if C is ±1 valued, or perhaps {-1, 0, +1}-valued.\n\n### How to optimize?\n\nComparisons can be used to evaluate a proposed perturbations of a model, or to select amongst several candidate models. For most optimization approaches, this is all we need.\n\nFor gradient ascent, we can train by taking the partial derivative of $\\mathbb{E}$[C(_a_, _a_′)] with respect to only the first argument of C, where _a_ and _a'_ are independent samples from the current model.\n\nFor evolutionary search you could evaluate the fitness of each individual _x_by computing $\\mathbb{E}$[C(_x_, _y_)] for _y_ sampled from the current population.\n\n### What is being optimized?\n\nWhen optimizing a real-valued function V, we know where the optimization is going — towards inputs that are scored well under V. But what happens when we optimize for preferences defined by a comparison C? If C is transitive, then it’s clear what the optimum is. But if C isn’t transitive then the situation is a bit more subtle.\n\nIn fact there is an easy answer. If we consider C as the payoff function of a zero-sum game, then the systems described above will converge to the minimax equilibrium of the game (if they converge). This seems to be a very natural generalization of maximizing a scalar function V, via the correspondence C(_a, a_′) = V(_a_) - V(_a_′).\n\nThis suggests a general recipe for building agents to maximize preferences specified as (potentially intransitive) comparisons — we can apply the same techniques we use to play two player games with large state spaces.\n\n### Is this OK?\n\nIf preferences are given as comparisons, then I think the minimax equilibrium is really the only possibly-sensible solution. But that doesn’t necessarily mean its sensible.\n\nFor most applications I have in mind, I think this equilibrium is perfectly fine even if C is ±1 valued and throws out all information about the strength of comparisons.\n\nThe minimax equilibrium is supported entirely on solutions which can’t be robustly improved upon. That is, for each action _x_ in the support of the minimax equilibrium, there is no option _y_ which “reliably outperforms” _x_ - for every _y_≠_x_ there is some plausible option _z_ such that C(_x, z_) > C(_y, z_).\n\nI think this is basically enough to make me happy with an AI’s behavior, without even exploiting the further guarantee that $\\mathbb{E}$[C(_x, z_)] > $\\mathbb{E}$[C(_y, z_)] for a random action _z_ selected by the system.\n\nAlternatives\n============\n\nIf we don’t use comparisons, what would we do? I’ll talk about the case of [assigning approval values to actions](https://arbital.com/p/1w5), but the discussion seems more general.\n\n### Option 1: spread out over the scale\n\nI could try to give actions scores ranging over the whole scale between 0 and 1, such that every pair of importantly different options have significantly different scores.\n\nThis seems quite difficult. It requires my judgments to be quite sensitive to slight variations in the quality of outcomes, since otherwise our system will not have any incentive to make small improvements. But it also requires different judgments to be extremely consistent, which is tension with sensitivity.\n\nEven if we have estimates which are sensitive and consistent in expectation, if the estimates are at all noisy then [they will introduce unnecessary noise](http://rationalaltruist.com/2013/04/28/estimates-vs-head-to-head-comparisons/) that we could avoid by using direct comparisons.\n\nOverall, I don’t think this is a good option.\n\n### Option 2: reason about internal state\n\nIn my [unsupervised approval-directed proposal](https://arbital.com/p/1w5), the evaluating process has all of the time in the world. So it can literally enumerate different possible actions, and determine the capabilities of the agent by intensive inspection. I think this is a fine solution as far as it goes, but it’s not going to fly for evaluation processes that actually exist.\n\n### Option 3: compare proposals\n\nIn my [supervised approval-directed proposals](https://arbital.com/p/1vw), each action is reviewed by a critic who can make a counterproposal. We can adjust our rating of the action based on the quality of the counterproposal.\n\nThe scheme in this post is a simpler and cleaner version of the previous scheme. It is cleanly separated from the other kinds of “argument” that might occur between AI systems and the other kinds of help that the human might get from AI assistants.\n\nIn particular, it seems important to notice that the same learning system can play both sides of the game — there is no need to have two different learners, or for one of the players to “move second.”\n\nIt also seems worth noticing that this is a very simple modification to existing training procedures, and we could already implement it in practice if it seems necessary.\n\nApplications\n============\n\n### Human feedback\n\nAs discussed [here](https://arbital.com/p/1w1), we could learn a function to predict which of two outputs a human would prefer, and then use this predictor to train an actor to produce good outputs. This sounds significantly more promising than having the human score individual outputs and training a predictor to guess those scores, especially during the early phases of training when all of the outputs would be pretty bad.\n\n(I think this is the most important application.)\n\n### Imitation\n\nSuppose that we are following [this approach](https://arbital.com/p/1vx) to imitation learning, in which a classifier learns to distinguish human and machine behavior, while a reinforcement learner tries to fool the classifier.\n\nRather than training the classifier to predict labels, we can train it to pick which of two trajectories is a human and which is machine. We can then use this comparison function to directly train the reinforcement learner to look more human-like.\n\nI don’t know whether this would be a more robust approach. For the SVM in [Abbeel and Ng 2004](http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf) it wouldn’t make any difference. For more complex classifiers it seems plausible that the comparison would work better then using the log probability, especially during early phases of training when the imitator is not doing a very good job.', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '1', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '2016-02-05 07:32:47', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'false', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'PaulChristiano' ], childIds: [], parentIds: [ 'paul_ai_control' ], commentIds: [], questionIds: [], tagIds: [], relatedIds: [], markIds: [], explanations: [], learnMore: [], requirements: [], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '7751', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '5', type: 'newEdit', createdAt: '2016-02-24 22:48:56', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6881', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '4', type: 'newEdit', createdAt: '2016-02-11 21:47:57', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6880', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '3', type: 'newEdit', createdAt: '2016-02-11 21:47:25', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6435', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '2', type: 'newEdit', createdAt: '2016-02-04 02:52:28', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6434', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '1', type: 'newEdit', createdAt: '2016-02-04 02:51:33', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6433', pageId: 'optimizing_with_comparisons', userId: 'JessicaChuan', edit: '0', type: 'newParent', createdAt: '2016-02-04 02:44:46', auxPageId: 'paul_ai_control', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'false', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }