{
  localUrl: '../page/effability.html',
  arbitalUrl: 'https://arbital.com/p/effability',
  rawJsonUrl: '../raw/7vb.json',
  likeableId: '0',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: 'effability',
  edit: '2',
  editSummary: '',
  prevEdit: '1',
  currentEdit: '2',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Effability principle',
  clickbait: 'You are safer the more you understand the inner structure of how your AI thinks; the better you can describe the relation of smaller pieces of the AI's thought process.',
  textLength: '6217',
  alias: 'effability',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2017-02-16 20:07:39',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2017-02-16 20:06:50',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '138',
  text: 'A proposed [7v8 principle] of [2v] stating, "The more insight you have into the deep structure of an AI's cognitive operations, the more likely you are to succeed in aligning that AI."\n\nAs an example of increased effability, consider the difference between having the idea of [18t expected utility] while building your AI, versus having never heard of expected utility.  The idea of expected utility is so well-known that it may not seem salient as an insight anymore, but consider the difference between having this idea and not having it.\n\nStaring at the expected utility principle and how it decomposes into a utility function and a probability distribution, leads to a potentially obvious-sounding but still rather important insight:\n\nRather than *all* behaviors and policies and goals needing to be up-for-grabs in order for an agent to adapt itself to a changing and unknown world, the agent can have a *stable utility function* and *changing probability distribution.*\n\nE.g., when the agent tries to grab the cheese and discovers that the cheese is too high, we can view this as an [1ly update] to the agent's *beliefs about how to get cheese,* without changing the fact that *the agent wants cheese.*\n\nSimilarly, if we want superhuman performance at playing chess, we can ask for an AI that has a known, stable, understandable preference to win chess games; but a probability distribution that has been refined to greater-than-human accuracy about *which* policies yield a greater probabilistic expectation of winning chess positions.\n\nThen contrast this to the state of mind where you haven't decomposed your understanding of cognition into preference-ish parts and belief-ish parts.  In this state of mind, for all you know, every aspect of the AI's behavior, every goal it has, must potentially need to change in order for the AI to deal with a changing world; otherwise the AI will just be stuck executing the same behaviors over and over... right?  Obviously, this notion of AI with unchangeable preferences is just a fool's errand.  Any AI like that would be too stupid to make a [6y major difference] for good or bad. %note: The idea of [10g] is also important here; e.g. that scientific curiosity is already an instrumental strategy for 'make as many paperclips as possible', rather than an AI needing a separate [1bh terminal] preference about scientific curiosity in order to ever engage in it.%\n\n(This argument has indeed been encountered in the wild many times.)\n\nProbability distributions and utility functions have now been known for a relatively long time and are understood relatively well; people have made many, many attempts to poke at their structure and imagine potential variations and show what goes wrong with those variations.  There is now known an [7hh enormous family of coherence theorems] stating that "Strategies which are not qualitatively dominated can be viewed as coherent with some consistent probability distribution and utility function."  This suggests that we can in a broad sense expect that, [21 as a sufficiently advanced AI's behavior is more heavily optimized for not qualitatively shooting itself in the foot, that AI will end up exhibiting some aspects of expected-utility reasoning].  We have some idea of why a sufficiently advanced AI would have expected-utility-ish things going on *somewhere* inside it, or at least behave that way so far as we could tell by looking at the AI's external actions.\n\nSo we can say, "Look, if you don't *explicitly* write in a utility function, the AI is probably going to end up with something like a utility function *anyway,* you just won't know where it is.  It seems considerably wiser to know what that utility function says and write in on purpose.  Heck, even if you say you explicitly *don't* want your AI to have a stable utility function, you'd need to know all the coherence theorems you're trying to defy by saying that!"\n\nThe Effability Principle states (or rather hopes) that as we get marginally more of this general kind of insight into an AI's operations, we become marginally more likely to be able to align the AI.\n\nThe example of expected utility arguably suggests that if there are any *more* ideas like that lying around, which we *don't* yet have, our lack of those ideas may entirely doom the AI alignment project or at least make it far more difficult.  We can in principle imagine someone who is just using a big reinforcement learner to try to execute some large [6y pivotal act], who has no idea where the AI is keeping its consequentialist preferences or what those preferences are; and yet this person was *so* paranoid and had the resources to put in *so* much monitoring and had *so* many tripwires and safeguards and was *so* conservative in how little they tried to do, that they succeeded anyway.  But it doesn't sound like a good idea to try in real life.\n\nThe search for increased effability has generally motivated the "Agent Foundations" agenda of research within MIRI.  While not the *only* aspect of AI alignment, a concern is that this kind of deep insight may be a heavily serially-loaded task in which researchers need to develop one idea after another, compared to relatively [shallow_alignment_ideas shallow ideas in AI alignment] that require less serial time to create.  That is, this kind of research is among the most important kinds of research to start *early.*\n\nThe chief rival to effability is the [supervisability_principle Supervisability Principle], which, while not directly opposed to effability, tends to focus our understanding of the AI at a much larger grain size.  For example, the Supervisability Principle says, "Since the AI's behaviors are the only thing we can train by direct comparison with something we know to be already aligned, namely human behaviors, we should focus on ensuring the greatest possible fidelity at that point, rather than any smaller pieces whose alignment cannot be directly determined and tested in the same way."  Note that both principles agree that it's important to [7v7 understand] certain facts about the AI as well as possible, but they disagree about what should be our design priority for rendering maximally understandable.',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [],
  parentIds: [
    'understandability_principle'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22063',
      pageId: 'effability',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-02-16 20:07:39',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22062',
      pageId: 'effability',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newEditGroup',
      createdAt: '2017-02-16 20:07:04',
      auxPageId: 'EliezerYudkowsky',
      oldSettingsValue: '123',
      newSettingsValue: '2'
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22061',
      pageId: 'effability',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-02-16 20:06:51',
      auxPageId: 'understandability_principle',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22059',
      pageId: 'effability',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-02-16 20:06:50',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}