Ontology identification problem: Technical tutorial

https://arbital.com/p/ontology_identification_technical_tutorial

by Eliezer Yudkowsky May 31 2015 updated Feb 5 2016

Technical tutorial for ontology identification problem.


The problem of ontology identification is the problem of loading a goal into an advanced agent when that agent's representation of the world is likely to change in ways unforeseen in the development phase. This tutorial focuses primarily on explaining what the problem is and why it is a foreseeable difficulty; for the corresponding research problems, see the main page on Ontology Identification.

This is a technical tutorial, meaning that it assumes some familiarity with value alignment theory, the value identification problem, and safety thinking for advanced agents.

To isolate ontology identification from other parts of the value identification problem, we consider a simplified but still very difficult problem: to state an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe: namely, the amount of diamond is the number of carbon atoms covalently bound to four other carbon atoms. Since it seems that in this case our intended goal should be crisply definable relative to our universe's physics, we can avert many other issues of trying to identify complex values to the agent. Ontology identification is a difficulty that still remains even in this case - the agent's representation of 'carbon atoms' may still change over time.

Introduction: Two sources of representational unpredictability

Suppose we wanted to write a hand-coded, object-level utility function that evaluated the amount of diamond material present in the AI's model of the world. We might foresee the following two difficulties:

  1. Where exactly do I find 'carbon atoms' inside the AI's model of the world? As the programmer, all I see are these mysterious ones and zeroes, and the only parts that directly correspond to events I understand is the represention of the pixels in the AI's webcam… maybe I can figure out where the 'carbon' concept is by showing the AI graphite, buckytubes, and a diamond on its webcam and seeing what parts get activated… whoops, looks like the AI just revised its internal representation to be more computationally efficient, now I once again have no idea what 'carbon' looks like in there. How can I make my hand-coded utility function re-bind itself to 'carbon' each time the AI revises its model's representation of the world?

  2. What exactly is 'diamond'? If you say it's a nucleus with six protons, what's a proton? If you define a proton as being made of quarks, what if there are unknown other particles underlying quarks? What if the Standard Model of physics is incomplete or wrong - can we state exactly and formally what constitutes a carbon atom when we aren't certain what the underlying quarks are made of?

Difficulty 2 probably seems more exotic than the first, but Difficulty 2 is easier to explain in a formal sense and turns out to be a simpler way to illustrate many of the key issues that also appear in Difficulty 1. We can see Difficulty 2 as the problem of binding an intended goal to an unknown territory, and Difficulty 1 as the problem of binding an intended goal to an unknown map. So the first step of the tutorial will be to walk through how Difficulty 2 (what exactly is a diamond?) might result in weird behavior in an unbounded agent intended to be a diamond maximizer.

Try 1: Hacking AIXI to maximize diamonds?

The classic unbounded agent - an agent using far more computing power than the size of its environment - is AIXI. Roughly speaking, AIXI considers all computable hypotheses for how its environment might work - all possible Turing machines that would turn AIXI's outputs into AIXI's future inputs. (The finite variant AIXI-tl has a hypothesis space that includes all Turing machines that can be specified using fewer than $~$l$~$ bits and run in less than time $~$t$~$.)

From the perspective of AIXI, any Turing machine that takes one input tape and produces two output tapes is a "hypothesis about the environment", where the input to the Turing machine encodes AIXI's hypothetical action, and the outputs are interpreted as a prediction about AIXI's sensory data and AIXI's reward signal. (In Marcus Hutter's formalism, the agent's reward is a separate sensory input to the agent, so hypotheses about the environment also make predictions about sensed rewards). AIXI then behaves as a [ Bayesian predictor] that uses algorithmic complexity to give higher [ prior probabilities] to simpler hypotheses (that is, Turing machines with fewer states and smaller state transition diagrams), and updates its mix of hypotheses based on sensory evidence (which can confirm or disconfirm the predictions of particular Turing machines).

As a decision agent, AIXI always outputs the motor action that leads to the highest predicted reward, assuming that the environment is described by the updated probability mixture of all Turing machines that could represent the environment (and assuming that future iterations of AIXI update and choose similarly).

The ontology identification problem shows up sharply when we imagine trying to modify AIXI to "maximize expectations of diamonds in the outside environment" rather than "maximize expectations of sensory reward signals". As a [ Cartesian agent], AIXI has sharply defined sensory inputs and motor outputs, so we can have a [ probability mixture] over all Turing machines that relate motor outputs to sense inputs (as crisply represented in the input and output tapes). But even if some otherwise arbitrary Turing machine happens to predict sensory experiences extremely well, how do we look at the state and working tape of that Turing machine to evaluate 'the amount of diamond' or 'the estimated number of carbon atoms bound to four other carbon atoms'? The highest-weighted Turing machines that have best predicted the sensory data so far, presumably contain some sort of representation of the environment, but we have no idea how to get 'the number of diamonds' out of it.

(Example: Maybe one Turing machine that is producing good sequence predictions inside AIXI, actually does so by simulating a large universe, identifying a superintelligent civilization that evolves inside that universe, and motivating that civilization to try to intelligently predict future future bits from past bits (as provided by some intervention). To write a formal utility function that could extract the 'amount of real diamond in the environment' from arbitrary predictors in the above case , we'd need the function to read the Turing machine, decode that universe, find the superintelligence, decode the superintelligence's thought processes, find the concept (if any) resembling 'diamond', and hope that the superintelligence had precalculated how much diamond was around in the outer universe being manipulated by AIXI.)

This is, in general, the reason why the AIXI family of architectures can only contain agents defined to maximize direct functions of their sensory input, and not agents that behave so as to optimize facts about their external environment. (We can't make AIXI maximize diamonds by making it want pictures of diamonds because then it will just, e.g., [ build an environmental subagent that seizes control of AIXI's webcam and shows it pictures of diamonds]. If you ask AIXI to show itself sensory pictures of diamonds, you can get it to show its webcam lots of pictures of diamonds, but this is not the same thing as building an environmental diamond maximizer.)

Try 2: Unbounded agent using classical atomic hypotheses?

Given the origins of the above difficulty, we next imagine constraining the agent's hypothesis space to something other than "literally all computable functions from motor outputs to sense inputs", so that we can figure out how to find diamonds or carbon inside the agent's representation of the world.

As an [ unrealistic example]: Suppose someone was trying to define 'diamonds' to the AI's utility function. Suppose they knew about atomic physics but not nuclear physics. Suppose they build an AI which, during its development phase, learns about atomic physics from the programmers, and thus builds a world-model that is based on atomic physics.

Again for purposes of [ unrealistic examples], suppose that the AI's world-model is encoded in such fashion that when the AI imagines a molecular structure - represents a mental image of some molecules - then carbon atoms are represented as a particular kind of basic element of the representation. Again, as an [ unrealistic example], imagine that there are [ little LISP tokens] representing environmental objects, and that the environmental-object-type of carbon-objects is encoded by the integer 6. Imagine also that each atom, inside this representation, is followed by a list of the other atoms to which it's covalently bound. Then when the AI is imagining a carbon atom participating in a diamond, inside the representation we would see an object of type 6, followed by a list containing exactly four other 6-objects.

Can we fix this representation for all hypotheses, and then write a utility function for the AI that counts the number of type-6 objects that are bound to exactly four other type-6 objects? And if we did so, would the result actually be a diamond maximizer?

AIXI-atomic

As a first approach to implementing this idea - an agent whose hypothesis space is constrained to models that directly represent all the carbon atoms - imagine a variant of AIXI-tl that, rather than considering all tl-bounded Turing machines, considers all simulated atomic universes containing up to 10^100 particles spread out over up to 10^50 light-years. In other words, the agent's hypotheses are universe-sized simulations of classical, pre-nuclear models of physics; and these simulations are constrained to a common representation, so a fixed utility function can look at the representation and count carbon atoms bound to four other carbon atoms. Call this agent AIXI-atomic.

(Note that AIXI-atomic, as an unbounded agent, may use far more computing power than is embodied in its environment. For purposes of the thought experiment, assume that the universe contains exactly one hypercomputer that runs AIXI-atomic.)

A first difficulty is that universes composed only of classical atoms are not good explanations of our own universe, even in terms of surface phenomena; e.g. the ultraviolet catastrophe. So let it be supposed that we have simulation rules for classical physics that replicate at least whatever phenomena the programmers have observed at [ development time], even if the rules have some seemingly ad-hoc elements (like there being no ultraviolent catastrophes). We will not however suppose that the programmers have discovered all experimental phenomena we now see as pointing to nuclear or quantum physics.

A second difficulty is that a simulated universe of classical atoms does not identify where in the universe the AIXI-atomic agent resides, or say how to match the types of AIXI-atomic's sense inputs with the underlying behaviors of atoms. We can elide this difficulty by imagining that AIXI-atomic simulates classical universes containing a single hypercomputer, and that AIXI-atomic knows a simple function from each simulated universe onto its own sensory data (e.g., it knows to look at the simulated universe, and translate simulated photons impinging on its webcam onto predicted webcam data in the standard format). This elides most of the problem of [ naturalized induction].

So the AIXI-atomic agent that is hoped to maximize diamond:

Suppose our own real universe was amended to otherwise be exactly the same, but contain a single [ impermeable] hypercomputer. Suppose we defined an agent like the one above, using simulations of 1910-era models of physics, and ran that agent on the hypercomputer. Should we expect the result to be an actual diamond maximizer - expect that the outcome of running this program on a single hypercomputer would indeed be that most mass in our universe would be turned into carbon and arranged into diamonds?

Anticipated failure: AIXI-atomic tries to 'maximize outside the simulation'

In fact, our own universe isn't atomic, it's nuclear and quantum-mechanical. This means that AIXI-atomic does not contain any hypotheses in its hypothesis space that directly represent our universe. By the previously specified hypothesis of the thought experiment, AIXI-atomic's model of simulated physics was built to encompass all the experimental phenomena the programmers had yet discovered, but there were some quantum and nuclear phenomena that AIXI-atomic's programmers had not yet discovered. When those phenomena are discovered, there will be no simple explanation on the direct terms of the model.

Intuitively, of course, we'd like AIXI-atomic to discover the composition of nuclei, shift its models to use nuclear physics, and refine the 'carbon atoms' mentioned in its utility function to mean 'atoms with nuclei containing six protons'.

But we didn't actually specify that when constructing the agent (and saying how to do it in general is, so far as we know, hard; in fact it's the whole ontology identification problem). We constrained the hypothesis space to contain only universes running on the classical physics that the programmers knew about. So what happens instead?

Probably the 'simplest atomic hypothesis that fits the facts' will be an enormous atom-based computer, simulating nuclear physics and quantum physics in order to create a simulated non-classical universe whose outputs are ultimately hooked up to AIXI's webcam. From our perspective this hypothesis seems silly, but if you restrict the hypothesis space to only classical atomic universes, that's what ends up being the computationally simplest hypothesis that predicts, in detail, the results of nuclear and quantum experiments.

AIXI-atomic will then try to choose actions so as to maximize the amount of expected diamond inside the probable outside universes that could contain the giant atom-based simulator of quantum physics. It is not obvious what sort of behavior this would imply.

Metaphor for difficulty: AIXI-atomic cares about only fundamental carbon

One metaphorical way of looking at the problem is that AIXI-atomic was implicitly defined to care only about diamonds made out of ontologically fundamental carbon atoms, not diamonds made out of quarks. A probability function that assigns 0 probability to all universes made of quarks, and a utility function that outputs a constant on all universes made of quarks, [ yield functionally identical behavior]. So it is an exact metaphor to say that AIXI-atomic only cares about universes with ontologically basic carbon atoms, given that AIXI-atomic's hypothesis space only contains universes with ontologically basic carbon atoms.

Imagine that AIXI-atomic's hypothesis space does contain many other universes with other laws of physics, but its hand-coded utility function just returns 0 on those universes since it can't find any 'carbon atoms' inside the model. Since AIXI-atomic only cares about diamond made of fundamental carbon, when AIXI-atomic discovers the experimental data implying that almost all of its probability mass should reside in nuclear or quantum universes in which there were no fundamental carbon atoms, AIXI-atomic stops caring about the effect its actions have on the vast majority of probability mass inside its model. Instead AIXI-atomic tries to maximize inside the tiny remaining probabilities in which it is inside a universe with fundamental carbon atoms that is somehow reproducing its sensory experience of nuclei and quantum fields… for example, a classical atomic universe containing a computer simulating a quantum universe and showing the results to AIXI-atomic.

From our perspective, we failed to solve the 'ontology identification problem' and get the real-world result we intended, because we tried to define the agent's utility function over properties of a universe made out of atoms, and the real universe turned out to be made of quantum fields. This caused the utility function to fail to bind to the agent's representation in the way we intuitively had in mind.

Today we do know about quantum mechanics, so if we tried to build a diamond maximizer using some bounded version of the above formula, it might not fail on account of the particular exact problem of atomic physics being false.

But perhaps there are discoveries still remaining that would change our picture of the universe's ontology to imply something else underlying quarks or quantum fields. Human beings have only known about quantum fields for less than a century; our model of the ontological basics of our universe has been stable for less than a hundred years of our human experience. So we should seek an AI design that does not assume we know the exact, true, fundamental ontology of our universe during an AI's development phase.

As another important metaphorical case in point, consider a human being who feels angst on contemplating a universe in which "By convention sweetness, by convention bitterness, by convention color, in reality only atoms and the void" (Democritus); someone who wonders where there is any room in this collection of lifeless particles for love, free will, or even the existence of people. Since, after all, people are just mere collections of atoms. This person can be seen as undergoing an ontology identification problem: they don't know how to find the objects of value in a representation containing atoms instead of ontologically basic people.

Human beings simultaneously evolved a particular set of standard mental representations (e.g., a representation for colors in terms of a 3-dimensional subjective color space) along with evolving emotions that bind to these representations (identification of flowering landscapes as beautiful. When someone visualizes any particular configuration of 'mere atoms', their built-in desires don't automatically fire and bind to that mental representation, the way they would bind to the brain's native representation of the environment. Generalizing that no set of atoms can be meaningful (since no abstract configuration of 'mere atoms' they imagine, seems to trigger any emotions to bind to it) and being told that reality is composed entirely of such atoms, they feel they've been told that the true state of reality, underlying appearances, is a meaningless one.

The utility rebinding problem

Intuitively, we would think it was [ common sense] for an agent that wanted diamonds to react to the experimental data identifying nuclear physics, by deciding that a carbon atom is 'really' a nucleus containing six protons. We can imagine this agent [ common-sensically] updating its model of the universe to a nuclear model, and redefining the 'carbon atoms' that its old utility function counted to mean 'nuclei containing exactly six protons'. Then the new utility function could evaluate outcomes in the newly discovered nuclear-physics universe. The problem of producing this desirable agent behavior is the utility rebinding problem.

To see why this problem is nontrivial, consider that the most common form of carbon is C-12, with nuclei composed of six protons and six neutrons. The second most common form of carbon is C-14, with nuclei composed of six protons and eight neutrons. Is C-14 truly carbon - is it the sort of carbon that can participate in valuable diamonds of high utility? Well, that depends on your utility function, obviously; and from a human perspective it just sounds arbitrary.

But consider a closely analogous question from a humanly important perspective: Is a chimpanzee truly a person? Where the question means not, "How do we arbitrarily define the syllables per-son?" but "Should we care a lot about chimpanzees?", i.e., how do we define the part of our preferences that care about people, to the possibly-person edge cases of chimpanzees?

If you live in a world where chimpanzees haven't been discovered, you may have an easy time running your utility function over your model of the environment, since the objects of your experience classify sharply into the 'person' and 'nonperson' categories. Then you discover chimpanzees, and they're neither typical people (John Smith) nor typical nonpeople (like rocks).

We can see the force of this question as arising from something like an ontological shift: we're used to valuing cognitive systems that are made from whole human minds, but it turns out that minds are made of parts, and then we have the question of how to value things that are made from some of the person-parts but not all of them… sort of like the question of how to treat carbon atoms that have the usual number of protons but not the usual number of neutrons.

Chimpanzees definitely have neural areas of various sizes, and particular cognitive abilities - we can suppose the empirical truth is unambiguous at this level, and known to us. So the question is then whether we regard a particular configuration of neural parts (a frontal cortex of a certain size) and particular cognitive abilities (consequentialist means-end reasoning and empathy, but no recursive language) as something that our 'person' category values… once we've rewritten the person category to value configurations of cognitive parts, rather than whole atomic people.

In fact, we run into this question as soon as we learn that human beings run on brains and the brains are made out of neural regions with functional properties; we can then imagine chimpanzees even if we haven't met any, and ask to what degree our preferences should treat this edge-person as deserving of moral rights. If we can 'rebind' our emotions and preferences to live in a world of nuclear brains rather than atomic people, this rebinding will implicitly say whether or not a chimpanzee is a person, depending on how our preference over brain configurations treats the configuration that is a chimpanzee.

In this sense the problem we face with chimpanzees is exactly analogous to the question a diamond maximizer would face after discovering nuclear physics and asking itself whether a carbon-14 atom counted as 'carbon' for purposes of caring about diamonds. Once a diamond maximizer knows about neutrons, it can see that C-14 is chemically like carbon and forms the same kind of chemical bonds, but that it's heavier because it has two extra neutrons. We can see that chimpanzees have a similar brain architectures to the sort of people we always considered before, but that they have smaller frontal cortexes and no ability to use recursive language, etcetera.

Without knowing more about the diamond maximizer, we can't guess what sort of considerations it might bring to bear in deciding what is Truly Carbon and Really A Diamond. But the breadth of considerations human beings need to invoke in deciding how much to care about chimpanzees, is one way of illustrating that the problem of rebinding a utility function to a shifted ontology is [value-laden] and can potentially undergo [excursions] into complex desiderata. Redefining a [ moral category] so that it talks about the underlying parts of what were previously seen as all-or-nothing atomic objects, may carry an implicit ruling about how to value many kinds of [edge-case] objects that were never seen before.

It's possible that some formal part of this problem could be usefully carved out from the complex value-laded edge-case-reclassification part. E.g., how would you redefine carbon as C12 if there were no other isotopes? How would you rebind the utility function to at least C12? In general, how could edge cases be [ identified and queried] by an online Genie?

Reappearance on the reflective level

An obvious thought (especially for online Genies) is that if the AI is unsure about how to reinterpret its goals in light of a shifting mental representation, it should query the programmers.

Since the definition of a programmer would then itself be baked into the preference framework, the problem might [ reproduce itself on the reflective level] if the AI became unsure of where to find 'programmers': "My preference framework said that programmers were made of carbon atoms, but all I can find in this universe are quantum fields!"

Thus the ontology identification problem is arguably one of the [ critical subproblems] of value alignment: it plausibly has the property that, if botched, it could potentially [ crash the error recovery mechanism].

Diamond identification in multi-level maps

A realistic, bounded diamond maximizer wouldn't represent the outside universe with atomically detailed or quantum-detailed models. Instead, a bounded agent would have some version of a [ multi-level map] of the world in which the agent knew in principle that things were composed of atoms, but didn't model most things in atomic detail. A bounded agent's model of an airplane would have wings, or wing shapes, rather than atomically detailed wings. It would think about wings when doing aerodynamic engineering, atoms when doing chemistry, nuclear physics when doing nuclear engineering, and definitely not try to model everything in its experience down to the level of quantum fields.

At the present, there are not yet any proposed formalisms for how to do probability theory with multi-level maps (in other words: [ nobody has yet put forward a guess at how to solve the problem even given infinite computing power]). But it seems very likely that, if we did know what multi-level maps looked like formally, it might suggest a formal solution to non-value-laden utility-rebinding.

E.g., if an agent already has a separate high-level concept of 'diamond' that's bound to a lower-level concept of 'carbon atoms bound to four other carbon atoms', then maybe when you discover nuclear physics, the multi-level map itself would tend to suggest that 'carbon atoms' be re-bound to 'nuclei with six protons' or 'nuclei with six protons and six neutrons'. It might at least be possible to phrase the equivalent of a prior or mixture of weightings for how the utility function would re-bind itself, and say, "Given this prior, care about whatever that sparkly hard stuff 'diamond' ends up binding to on the lower level."

Unfortunately, we have very little formal probability theory to describe how a multi-level map would go from 'that unknown sparkly hard stuff' to 'carbon atoms bound to four other carbon atoms in tetrahedral patterns, which is the only known repeating pattern for carbon atoms bound to four other carbon atoms' to 'C12 and C14 are chemically identical but C14 is heavier'. This being the case, we don't know how to say anything about a dynamically updating multi-level map inside a preference framework.

If we were actually trying to build a diamond maximizer, we would be likely to encounter this problem long before it started formulating new physics. The equivalent of a computational discovery that changes 'the most efficient way to represent diamonds' is likely to happen much earlier than a physical discovery that changes 'what underlying physical systems probably constitute a diamond'.

This also means that we are liable to face the ontology identification problem long before the agent starts discovering new physics, as soon as it starts revising its representation. Only very unreflective agents with strongly fixed-in-place representations for every part of the environment that we think the agent is supposed to care about, would let the ontology identification problem be elided entirely. Only very not-self-modifying agents, or [ Cartesian agents] with goals formulated only over sense data, would not confront their programmers with ontology identification problems.

Research paths

More of these are described in the main article on ontology identification. But here's a quick list of some relevant research subproblems and avenues:

Some implications

The ontology identification problem is one more reason to believe that [ hard-coded object-level utility functions should be avoided] and that [ value identification in general is hard].

Ontology identification is heavily entangled with AGI problems, meaning that some research on ontology identification [ may need to be non-public]. This is an example instance of the argument that [ at least some VAT research may need to be non-public], based on that [ at least some AGI research is better off non-public].