Querying the AGI user

https://arbital.com/p/user_querying

by Eliezer Yudkowsky Mar 20 2016 updated Mar 20 2016

Postulating that an advanced agent will check something with its user, probably comes with some standard issues and gotchas (e.g., prioritizing what to query, not manipulating the user, etc etc).


[summary: There's a laundry list of things that might go wrong when we suppose that an advanced AI is checking something Potentially Bad with the user/operator/programmer to see if the user labels the thing as Considered Bad, and relying on this step of the workflow to exclude things that are Actually Bad. E.g., the user might not be able to detect Actually Bad things reliably, the space of Potentially Bad things might be so broad that the Actually Bad things are 1,000 items down the list of things that are Potentially Bad, the AI might just learn to do things that won't be Considered Bad and thereby seek out special cases of bad things that the user can't detect as bad, etcetera.]

If we're supposing that an advanced agent is checking something Potentially Bad with its user to find out if the thing is Considered Bad by that user, we need to worry about the following generic issues: