"This seems like a straw alt..."

{
  localUrl: '../page/2qh.html',
  arbitalUrl: 'https://arbital.com/p/2qh',
  rawJsonUrl: '../raw/2qh.json',
  likeableId: '1648',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: '2qh',
  edit: '5',
  editSummary: '',
  prevEdit: '4',
  currentEdit: '5',
  wasPublished: 'true',
  type: 'comment',
  title: '"This seems like a straw alt..."',
  clickbait: '',
  textLength: '3849',
  alias: '2qh',
  externalUrl: '',
  sortChildrenBy: 'recentFirst',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'PaulChristiano',
  editCreatedAt: '2016-03-19 18:02:23',
  pageCreatorId: 'PaulChristiano',
  pageCreatedAt: '2016-03-19 17:44:38',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: 'On a higher level of abstraction, we can imagine that the universe is parsed by us into a set of variables $V_i$ with values $v_i.$  We want to avoid the agent taking actions that cause large amounts of disutility by perturbing variables from $v_i$ to $v_i^*$ in a way that decreases utility\\.  However, the question of exactly which variables $V_i$ are important and shouldn't be entropically perturbed is value\\-laden \\- complicated, fragile, high in algorithmic complexity, with Humean degrees of freedom in the concept boundaries\\.  Rather than relying solely on teaching an agent exactly which parts of the environment shouldn't be perturbed and risking catastrophe if we miss an injunction, the low impact route would try to build an agent that tried to perturb fewer variables regardless\\.  The hope is that "have fewer side effects" will have a central core and be learnable by a manageable amount of training\\.  Conversely, trying to train "here is the list of bad effects not to have and important variables not to perturb" would be complicated and lack a simple core, because 'bad' and 'important' are value\\-laden\\.',
  anchorText: 'Rather than relying solely on teaching an agent exactly which parts of the environment shouldn't be perturbed and risking catastrophe if we miss an injunction',
  anchorOffset: '547',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '480',
  text: 'This seems like a straw alternative. More realistically, we could imagine an agent which avoids perturbing a variable if it predicts the human would say "changing that variable is problematic" when asked. Then:\n\n1. We don't have to explicitly cover injunctions, just to provide information that allows the agent to predict human judgments.\n2. If the AI is bad at making predictions, then it may just end up with lots of variables for which it thinks the human *might* say "changing that variable is problematic." Behaving appropriately with respect to this uncertainty could recover the desired behavior.\n\nConsider an agent who is learning to predict when a human considers a change problematic. Now suppose that the agent is not able to learn the complex value-laden concept of "important change," but is able to learn the simpler concept of "big change."\n\nThis agent can use the concept of "big change" in order to make predictions about "important change," namely: "if a change is big, it might be important."\n\nSo any agent who is able to learn the concept of "big change" should be able to make predictions *at least as well* as if it simply guessed that every big change had an appropriate probability of being important. For example, if 1% of big changes are important, then a reasonable learner, who is smart enough to learn the concept of "big change," will predict at least as well as if it simply predicted that each big change was important with 1% probability.\n\nIf we use such a learner appropriately, this seems like it can obtain behavior *at least as good* as if the agent was first been taught a measure of impact and then used that measure to avoid (or flag) high-impact consequences.\n\nTo me it feels *much* more promising to learn an impact measure implicitly as an input into what changes are "important." The alternative feels like a non-starter:\n\n* The track record for learning looks a lot better than figuring things out "by hand."\n* The learned approach is easy to integrate with existing and foreseeable systems, while the by hand approach seems to require big changes in AI architectures.\n* On the object level, the notion of low impact really doesn't look like it is going to have a clean theoretical specification (you point out many of the concerns).\n\nI would like to better understand our disagreement, though I'm not sure if it's a priority and so you should feel free to ignore. But if you want to clarify: does one of these two concerns capture your position regarding learning an impact measure?\n\n1. We might be able to specify an impact measure much more effectively than the agent can learn it (perhaps because we can directly specify a measure that will generalize well to radically different contexts, whereas a learned measure would not be robust to big context changes)\n2. Even if the agent could learn an impact measure, and even if it could predict objectionable changes effectively by using that impact measure conservatively, we shouldn't expect an objectionable-change-predictor to actually use this particular strategy or an equally effective alternative (perhaps it uses some other strategy which perhaps achieves a higher payoff in simple environments but then generalizes worse).\n\n(For reference, the main context change I have in mind is moving from "weak agent proposing dumb plans" to "smarter agent proposing cleverer plans," where "cleverer" may involve some optimization for being apparently low impact).\n\nAlternatively, I may be misunderstanding where your position. I agree that even if you want an agent to learn an impact measure, it is worth thinking about what kind of impact measure it might learn and how that measure will generalize. So it's possible that we don't actually disagree about how the ultimate agent might look, but are just emphasizing different parts of how to get there.',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '2',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'PaulChristiano'
  ],
  childIds: [],
  parentIds: [
    'low_impact'
  ],
  commentIds: [
    '2qm'
  ],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8787',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '5',
      type: 'newEdit',
      createdAt: '2016-03-19 18:02:23',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8785',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '4',
      type: 'newEdit',
      createdAt: '2016-03-19 17:51:35',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8784',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-03-19 17:49:15',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8783',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '2',
      type: 'newEdit',
      createdAt: '2016-03-19 17:45:29',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8782',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-03-19 17:44:38',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8776',
      pageId: '2qh',
      userId: 'PaulChristiano',
      edit: '0',
      type: 'newParent',
      createdAt: '2016-03-19 17:15:18',
      auxPageId: 'low_impact',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}