Selective similarity metrics for imitation

{
  localUrl: '../page/selective_similarity_metric.html',
  arbitalUrl: 'https://arbital.com/p/selective_similarity_metric',
  rawJsonUrl: '../raw/2sp.json',
  likeableId: '1718',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '1',
  dislikeCount: '0',
  likeScore: '1',
  individualLikes: [
    'PatrickLaVictoir'
  ],
  pageId: 'selective_similarity_metric',
  edit: '4',
  editSummary: '',
  prevEdit: '3',
  currentEdit: '4',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Selective similarity metrics for imitation',
  clickbait: 'Can we make human-imitators more efficient by scoring them more heavily on imitating the aspects of human behavior we care about more?',
  textLength: '3451',
  alias: 'selective_similarity_metric',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'JessicaTaylor',
  editCreatedAt: '2016-03-24 04:21:45',
  pageCreatorId: 'JessicaTaylor',
  pageCreatedAt: '2016-03-24 04:11:29',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '53',
  text: 'A human-imitator (trained using [2sj]) will try to imitate _all_ aspects of human behavior.  Sometimes we care more about how good the imitation is along some axes than others, and it would be inefficient to imitate the human along all axes.  Therefore, we might want to design [scoring rules](https://en.wikipedia.org/wiki/Scoring_rule#Proper_scoring_rules) for human-imitators that emphasize matching performance along some axes more than others.\n\nCompare with [1vp], another proposed way of making human-imitation more efficient.\n\nHere are some ideas for constructing scoring rules:\n\n# Moment matching\n\nSuppose that, given a question, the human will write down a number.  We ask some predictor to output the parameters of some Gaussian distribution. We train the predictor to output Gaussian distributions that assign high probability to the training data.  Then, we sample from this Gaussian distribution to imitate the human.  Clearly, this is a way of imitating some aspects of human behavior (mean and variance) but not others.\n\nThe general form of this approach is to estimate _moments_ (expectations of some features) of the predictor's distribution on human behavior, and then sample from some distribution with these moments (such as an [exponential family distribution](https://en.wikipedia.org/wiki/Exponential_family))\n\nA less trivial example is a variant of [inverse reinforcement learning](http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf).  In this variant, to predict a sequence of actions the human takes, the predictor outputs some representation of a reward function on states (such as the parameters to some affine function of features of the state).  The human is modeled as a noisy reinforcement learner with this reward function, and the predictor is encouraged to have this model assign high probability to the human's actual trajectory.  To imitate the human, run a noisy inverse reinforcement learner with the predicted reward function.  The predictor can be seen as estimating moments of the human's trajectory (specifically, moments related to frequencies of state transitions between states with different features), and the system samples from a distribution with these same moments in order to imitate the human.\n\n# Combining proper scoring rules\n\nIt is easy to see that the sum of two [proper scoring rules](https://en.wikipedia.org/wiki/Scoring_rule#Proper_scoring_rules) is a proper scoring rule.  Therefore, it is possible to combine proper scoring rules to train a human-imitator to do well according to both scoring rules.  For example, we may score a distribution both on how much probability it assigns to human actions and to how well its moments match the moments of human actions, according to some weighting.\n\nNote that proper scoring rules can be characterized by [convex functions](https://www.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf).\n\n# Safety\n\nIt is unclear how safe it is to train a human-imitator using a selective similarity metric.  To the extent that the AI is _not_ doing some task the way a human would, it is possible that it is acting dangerously.  One hopes that, to the extent that the human-imitator is using a bad model to imitate the human (such as a noisy reinforcement learning model), it is not bad in a way that causes problems such as [2w].  It would be good to see if something like IRL-based imitation could behave dangerously in some realistic case.',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'JessicaTaylor'
  ],
  childIds: [],
  parentIds: [
    'ai_alignment'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9192',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'deleteTag',
      createdAt: '2016-04-01 05:58:26',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9110',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2016-03-27 05:59:19',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9029',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '4',
      type: 'newEdit',
      createdAt: '2016-03-24 04:21:45',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9028',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-03-24 04:18:06',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9026',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'newAlias',
      createdAt: '2016-03-24 04:14:04',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9027',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '2',
      type: 'newEdit',
      createdAt: '2016-03-24 04:14:04',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9025',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '1',
      type: 'newParent',
      createdAt: '2016-03-24 04:12:53',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9023',
      pageId: 'selective_similarity_metric',
      userId: 'JessicaTaylor',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-03-24 04:11:29',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}