{
  localUrl: '../page/goodness_estimate_bias.html',
  arbitalUrl: 'https://arbital.com/p/goodness_estimate_bias',
  rawJsonUrl: '../raw/57b.json',
  likeableId: '2999',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '1',
  dislikeCount: '0',
  likeScore: '1',
  individualLikes: [
    'EricBruylant'
  ],
  pageId: 'goodness_estimate_bias',
  edit: '4',
  editSummary: '',
  prevEdit: '3',
  currentEdit: '4',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Goodness estimate biaser',
  clickbait: 'Some of the main problems in AI alignment can be seen as scenarios where actual goodness is likely to be systematically lower than a broken way of estimating goodness.',
  textLength: '4639',
  alias: 'goodness_estimate_bias',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2016-07-08 17:53:19',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2016-07-07 21:59:08',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '138',
  text: 'A "[55 goodness] estimate [statistical_bias biaser]" is a system setup or phenomenon that seems [6r foreseeably] likely to cause the actual goodness of some AI plan to be systematically lower than the AI's estimate of that plan's goodness.  We want the AI's estimate to be [statistically_unbiased unbiased].\n\n## Ordinary examples\n\nSubtle and unsubtle [statistical_bias estimate-biasing] issues in machine learning are well-known and appear far short of [2c advanced agency]:\n\n●  A machine learning algorithm's performance on the training data is not an unbiased estimate of its performance on the test data.  Some of what the algorithm seems to learn may be particular to noise in the training data.  This fitted noise will not be fitted within the test data.  So test performance is not just unequal to, but *systematically lower than,* training performance; if we were treating the training performance as an estimate of test performance, it would not be an [statistically_unbiased unbiased] estimate.\n\n●  The [Winner's Curse](https://en.wikipedia.org/wiki/Winner%27s_curse) from auction theory observes that if bidders have noise in their unbiased estimates of the auctioned item's value, then the *highest* bidder, who receives the item, is more likely to have upward noise in their individually unbiased estimate, [1ly conditional] on their having won.  (E.g., three bidders with Gaussian noise in their value estimates submit bids on an item whose true value to them is 1.0; the winning bidder is likely to have valued the item at more than 1.0.)\n\nThe analogous [Optimizer's Curse](https://faculty.fuqua.duke.edu/~jes9/bio/The_Optimizers_Curse.pdf) observes that if we make locally unbiased but noisy estimates of the [subjective_expected_utility subjective expected utility] of several plans, then selecting the plan with 'highest expected utility' is likely to select an estimate with upward noise.  Barring compensatory adjustments, this means that actual utility will be systematically lower than expected utility, even if all expected utility estimates are individually unbiased.  Worse, if we have 10 plans whose expected utility can be unbiasedly estimated with low noise, plus 10 plans whose expected utility can be unbiasedly estimated with high noise, then selecting the plan with apparently highest expected utility favors the noisiest estimates!\n\n## In AI alignment\n\nWe can see many of the alleged [6r foreseeable difficulties] in [2v AI alignment] as involving similar processes that allegedly produce systematic downward biases in what we see as actual [55 goodness], compared to an AI's estimate of goodness:\n\n●  [2w] suggests that if we take an imperfectly or incompletely learned value function, then looking for the *maximum* or *extreme* of that value function is much more likely than usual to magnify what we see as the gaps or imperfections (because of [fragile_value fragility of value], plus the Optimizer's Curse); or destroy whatever aspects of value the AI didn't learn about (because optimizing a subset of properties is liable to set all other properties to extreme values).\n\nWe can see this as implying both "The AI's apparent goodness in non-extreme cases is an upward-biased estimate of its goodness in extreme cases" and "If the AI learns its goodness estimator less than [41k perfectly], the AI's estimates of the goodness of its best plans will systematically overestimate what we see as the actual goodness."\n\n●  [42] generally, and especially over [instrumental_incorrigibility instrumentally convergent incorrigibility], suggests that if there are naturally-arising AI behaviors we see as bad (e.g. routing around [2xd shutdown]), there may emerge a pseudo-adversarial selection of strategies that route around our attempted [48 patches] to those problems.  E.g., the AI constructs an environmental subagent to continue carrying on its goals, while cheerfully obeying 'the letter of the law' by allowing its current hardware to be shut down.  This pseudo-adversarial selection (though the AI does not have an explicit goal of thwarting us or selecting low-goodness strategies per se) again implies that actual [55 goodness] is likely to be systematically lower than the AI's estimate of what it's learned as 'goodness'; again to an [6q increasing degree] as the AI becomes [9f smarter] and [47 searches a wider policy space].\n\n[2r8 Mild optimization] and [2qp conservative strategies] can be seen as proposals to 'regularize' powerful optimization in a way that *decreases* the degree to which goodness in training is a biased (over)estimate of goodness in execution.',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '2',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [],
  parentIds: [
    'advanced_safety'
  ],
  commentIds: [
    '57t'
  ],
  questionIds: [],
  tagIds: [],
  relatedIds: [
    'edge_instantiation',
    'goodharts_curse'
  ],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '16242',
      pageId: 'goodness_estimate_bias',
      userId: 'EliezerYudkowsky',
      edit: '4',
      type: 'newEdit',
      createdAt: '2016-07-08 17:53:19',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '16239',
      pageId: 'goodness_estimate_bias',
      userId: 'EliezerYudkowsky',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-07-08 17:44:56',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '16236',
      pageId: 'goodness_estimate_bias',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2016-07-08 17:43:27',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '16055',
      pageId: 'goodness_estimate_bias',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2016-07-07 21:59:09',
      auxPageId: 'advanced_safety',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '16053',
      pageId: 'goodness_estimate_bias',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-07-07 21:59:08',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}