{ localUrl: '../page/nonadversarial_safety.html', arbitalUrl: 'https://arbital.com/p/nonadversarial_safety', rawJsonUrl: '../raw/7tb.json', likeableId: '0', likeableType: 'page', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], pageId: 'nonadversarial_safety', edit: '1', editSummary: '', prevEdit: '0', currentEdit: '1', wasPublished: 'true', type: 'wiki', title: 'The AI must tolerate your safety measures', clickbait: 'A corollary of the nonadversarial principle is that "The AI must tolerate your safety measures." ', textLength: '2947', alias: 'nonadversarial_safety', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'EliezerYudkowsky', editCreatedAt: '2017-02-13 18:41:49', pageCreatorId: 'EliezerYudkowsky', pageCreatedAt: '2017-02-13 18:41:49', seeDomainId: '0', editDomainId: 'EliezerYudkowsky', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '0', isEditorComment: 'false', isApprovedComment: 'false', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '78', text: 'A corollary of the [-7g0]: For every kind of safety measure proposed for a [-7g1], we should immediately ask how to avoid this safety measure inducing an [7g0 adversarial context] between the human programmers and the agent being constructed.\n\nA further corollary of the [cognitive_alignment generalized principle of cognitive alignment] would suggest that, if we know how to do it without inducing further problems, the AI should positively *want* the safety measure to be there.\n\nE.g., if the safety measure we want is a [2xd suspend button] (off switch), our first thought should be, "How do we build an agent such that it doesn't mind the off-switch being pressed?"\n\nAt a higher level of alignment, if something damages the off-switch, the AI might be so configured that it naturally and spontaneously thinks, "Oh no! The off-switch is damaged!" and reports this to the programmers, or failing any response there, tries to repair the off-switch itself. But this would only be a good idea if we were pretty sure we knew this wouldn't lead to the AI substituting its own helpful ideas of what an off-switch would do, or shutting off extra hard.\n\nSimilarly, if you start thinking how nice it would be to have the AI operating inside a [6z box] rather than running around in the outside world, your first thought should not be "How do I enclose this box in 12 layers of Faraday cages, a virtual machine running a Java sandbox, and 15 meters of concrete?" but rather "How would I go about constructing an agent that only cared about things inside a box and experienced no motive to affect anything outside the box?"\n\nAt a higher level of alignment we might imagine constructing a sort of agent that, if something went wrong, would think "Oh no I am outside the box, that seems very unsafe, how do I go back in?" But only if we were very sure that we were not thereby constructing a kind of agent that would, e.g., build a superintelligence outside the box just to make extra sure the original agent stayed inside it.\n\nMany classes of safety measures are only meant to come into play after something else has already gone wrong, implying that other things may have gone wrong earlier and without notice. This suggests that pragmatically we should focus on the principle of "The AI should leave the safety measures alone and not experience an incentive to change their straightforward operation" rather than tackling the more complicated problems of exact alignment inherent in "The AI should be enthusiastic about the safety measures and want them to work even better."\n\nHowever, if the AI is [1mq changing its own code or constructing subagents], it is necessary for the AI to have at least *some* positive motivation relating to any safety measures embodied in the operation of an internal algorithm. An AI indifferent to that code-based safety measure would tend to [1fx just leave the uninteresting code out of the next self-modification].', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '1', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'false', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'EliezerYudkowsky' ], childIds: [], parentIds: [ 'nonadversarial' ], commentIds: [], questionIds: [], tagIds: [ 'start_meta_tag' ], relatedIds: [], markIds: [], explanations: [], learnMore: [], requirements: [], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22005', pageId: 'nonadversarial_safety', userId: 'EliezerYudkowsky', edit: '0', type: 'newEditGroup', createdAt: '2017-02-13 18:42:01', auxPageId: 'EliezerYudkowsky', oldSettingsValue: '123', newSettingsValue: '2' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22003', pageId: 'nonadversarial_safety', userId: 'EliezerYudkowsky', edit: '0', type: 'newParent', createdAt: '2017-02-13 18:41:50', auxPageId: 'nonadversarial', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22004', pageId: 'nonadversarial_safety', userId: 'EliezerYudkowsky', edit: '0', type: 'newTag', createdAt: '2017-02-13 18:41:50', auxPageId: 'start_meta_tag', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22001', pageId: 'nonadversarial_safety', userId: 'EliezerYudkowsky', edit: '1', type: 'newEdit', createdAt: '2017-02-13 18:41:49', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'false', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }