A difficult, far-reaching open problem in AI alignment is to specify an [ unbounded] formula for an agent that would, if run on an [ unphysically large finite computer], create as much diamond material as possible. The goal of 'diamonds' was chosen to make it physically crisp as to what constitutes a 'diamond'. Supposing a crisp goal plus hypercomputation avoids some problems in value alignment, while still invoking many others, making it an interesting intermediate problem.
Importance
The diamond maximizer problem is to give an [ unbounded] description of a computer program such that, if it were instantiated on a sufficiently powerful but [ physical computer], the result of running the program would be the creation of an immense amount of diamond - around as much diamond as is physically possible for an agent to create.
The fact that this problem is still extremely hard shows that the value alignment problem is not just due to the Complexity of value. As a thought experiment, it helps to distinguish value-complexity-laden difficulties from those that arise even for simple goals.
It also helps to [ illustrate the difficulty of value alignment] by making the more clearly visible point that we can't even figure out how to create lots of diamond using unlimited computing power, never mind creating value using [ bounded computing power].
Problems avoided
If we can crisply define exactly what a 'diamond' is, in theory it seems like we should be able to avoid issues of Edge Instantiation, Unforeseen Maximums, and trying to convey complex values into the agent.
The amount of diamond is defined as the number of carbon atoms that are covalently bonded, by electrons, to exactly four other carbon atoms. A carbon atom is any nucleus containing six protons and any number of neutrons, bound by the strong force. The utility of a universal history is the total amount of Minkowskian interval spent by all carbon atoms being bound to exactly four other carbon atoms. More precise definitions of 'bound', or the amount of measure in a quantum system that is being bound, are left to the reader - any crisp definition will do, so long as we are confident that it has no unforeseen maximum at things we don't intuitively see as diamonds.
Problems challenged
Since this diamond maximizer would hypothetically be implemented on a very large but physical computer, it would confront [ reflective stability], the [ anvil problem], and the problems of making [ subagents].
To the extent the diamond maximizer might need to worry about other agents in the environment that have a good ability to model, or that it may need to cooperate with other diamond maximizers, it must resolve [ Newcomblike problems] using some [ logical decision theory]. This would also require it to confront [ logical uncertainty] despite possessing immense amounts of computing power.
To the extent the diamond maximizer must work well in a rich real universe that might operate according to any number of possible physical laws, it faces a problem of [ naturalized induction] and ontology identification. See the article on ontology identification for the case that even for the goal of 'make diamonds', the problem of [ goal identification] remains difficult.
Unreflective diamond maximizer
As a further-simplified but still unsolved problem, an unreflective diamond maximizer is a diamond maximizer implemented on a [ Cartesian hypercomputer] in a [ causal universe] that does not face any [ Newcomblike problems]. This further avoids problems of reflectivity and logical uncertainty. In this case, it seems plausible that the primary difficulty remaining is just the ontology identification problem. Thus the open problem of describing an unreflective diamond maximizer is a central illustration for the difficulty of ontology identification.
Comments
Malcolm Ocean
Shouldn't this be four?
Paul Christiano
This seems like a good example to have at hand. I'm skeptical that it's much easier than what we really care about, but I guess we'll see (eventually).
Rather than a very big physical computer, it might be a bit easier to imagine a world full of (stochastic) hypercomputers that can solve their own (stochastic) evaluation problems, i.e. reflective oracles. This involves reflection but not computational limitations, so seems to capture a lot of what you care about without including the whole AI problem.