Principles in AI alignment

{
  localUrl: '../page/alignment_principle.html',
  arbitalUrl: 'https://arbital.com/p/alignment_principle',
  rawJsonUrl: '../raw/7v8.json',
  likeableId: '0',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: 'alignment_principle',
  edit: '2',
  editSummary: '',
  prevEdit: '1',
  currentEdit: '2',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Principles in AI alignment',
  clickbait: 'A 'principle' of AI alignment is a very general design goal like 'understand what the heck is going on inside the AI' that has informed a wide set of specific design proposals.',
  textLength: '3461',
  alias: 'alignment_principle',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2017-02-16 18:54:18',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2017-02-16 18:44:16',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '103',
  text: '[summary: A 'principle' of [2v AI alignment] is something we want in a broad sense for the whole AI, which has informed narrower design proposals for particular parts or aspects of the AI.\n\nExamples:\n\n- The **[7g0]** says that the AI should never be searching for a way to defeat our safety measures or do something else we don't want, even if we *think* this search will come up empty; it's just the wrong thing for us to program computing power to do.\n  - This informs the proposal of [45], subproposal [1b7]: if we build a [2xd suspend button] into the AI, we need to make sure the AI experiences no [10k instrumental pressure] to [7g2 disable the suspend button].\n- The **[7tf]** says that when we are building the first AGI, we should try to do as little as possible, using the least dangerous cognitive computations possible, in order to prevent the default outcome of the world otherwise being destroyed by the second AGI.\n  - This informs the proposal of [2r8] and [4mn taskishness]:  We are safer if all goals and subgoals of the AI are formulated in such a way that they can be achieved as greatly as preferable using a bounded amount of effort, and the AI only exerts enough effort to do that.]\n\nA 'principle' of [2v AI alignment] is something we want in a broad sense for the whole AI, which has informed narrower design proposals for particular parts or aspects of the AI.\n\nFor example:\n\n- The **[7g0]** says that the AI should never be searching for a way to defeat our safety measures or do something else we don't want, even if we *think* this search will come up empty; it's just the wrong thing for us to program computing power to do.\n  - This informs the proposal of [5s]: we ought to build an AI that wants to attain the class of outcomes we want to see.\n  - This informs the proposal of [45], subproposal [1b7]: if we build a [2xd suspend button] into the AI, we need to make sure the AI experiences no [10k instrumental pressure] to [7g2 disable the suspend button].\n- The **[7tf]** says that when we are building the first aligned AGI, we should try to do as little as possible, using the least dangerous cognitive computations possible, that is necessary in order to prevent the default outcome of the world being destroyed by the first unaligned AGI.\n  - This informs the proposal of [2r8] and [4mn Taskishness]:  We are safer if all goals and subgoals of the AI are formulated in such a way that they can be achieved as greatly as preferable using a bounded amount of effort, and the AI only exerts enough effort to do that.\n  - This informs the proposal of [102 Behaviorism]:  It seems like there are some [6y pivotal-act] proposals that don't require the AI to understand and predict humans in great detail, just to master engineering; and it seems like we can head off multiple thorny problems by not having the AI trying to model humans or other minds in as much detail as possible.\n\nPlease be [10l guarded] about declaring things to be 'principles' unless they have already informed more than one specific design proposal and more than one person thinks they are a good idea.  You could call them 'proposed principles' and post them under your own domain if you personally think they are a good idea.  There are a *lot* of possible 'broad design wishes', or things that people think are 'broad design wishes', and the principles that have actually already informed specific design proposals would otherwise get lost in the crowd.',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [
    'nonadversarial',
    'minimality_principle',
    'understandability_principle',
    'hyperexistential_separation'
  ],
  parentIds: [
    'ai_alignment'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22903',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-12-04 20:49:24',
      auxPageId: 'hyperexistential_separation',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22055',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-02-16 19:11:05',
      auxPageId: 'understandability_principle',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22052',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-02-16 18:55:03',
      auxPageId: 'minimality_principle',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22048',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-02-16 18:54:54',
      auxPageId: 'nonadversarial',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22045',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-02-16 18:54:18',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22044',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newEditGroup',
      createdAt: '2017-02-16 18:54:17',
      auxPageId: 'EliezerYudkowsky',
      oldSettingsValue: '123',
      newSettingsValue: '2'
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22038',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-02-16 18:44:17',
      auxPageId: 'ai_alignment',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22036',
      pageId: 'alignment_principle',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-02-16 18:44:16',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'true',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}