"I'm skeptical of Orthogonal..."

I'm skeptical of Orthogonality. My basic concern is that it can be interpreted as true-but-useless for purposes of defending it, and useful-but-implausible when trying to get it to do some work for you, and that the user of the idea may not notice the switch-a-roo. Consider the following statements: there are arbitrarily powerful cognitive agents

which have circular preferences,
with the goal of paperclip maximization,
with the goal of phlogiston maximization,
which are not relfective,
with values aligned with humanity.

Rehearsing the arguments for Orthogonality and then evaluating these questions, I find my mind gets very slippery.

Orthongonality proponents I've spoken to say 1 is false, because "goal space" excludes circular preferences. But there are very likely other restrictions on goal space imposed once an agent groks things like symmetry. If "goal space" means whatever goals are not excluded by our current understanding of intelligence, I think Orthogonality is unlikely (and poorly formulated). If it means "whatever goals powerful cognitive agents can have", Orthogonality is tautological and distracts us from pursuing the interesting question of what that space of goals actually is. Let's narrow down goal space.

If 2 and 3 get different answers, why? Might a paperclip maximizer take liberties with what is considered a paperclip once it learns that papers can be electrostatically attracted?

If 4 is easily true, I wonder if we're defining "mind space" too broadly to be useful. I'd really like humanity to focus on the sector of mind space that we should focus on in order to get a good outcome. The forms of Orthogonality which are clearly (to me) true distract from the interesting question of what that sector actually is. Let's narrow down mind space.

For 5, I don't find Orthogonality to be a convincing argument. A more convincing argument is to shoot for "humanity can grow up to have arbitrarily high cognitive power" instead.

Comments

Eliezer Yudkowsky

As regards 4, I'd say that while there may theoretically be arbitrarily powerful agents in math-space that are non-reflective, it's not clear that this is a pragmatic truth about most of the AIs that would exist in the long run - although we might be able to get very powerful non-reflective genies. So we're interested in some short-run solutions that involve nonreflectivity, but not long-run solutions.

I don't think 2 and 3 do have different answers. See the argument about what happens if you use an AI that only considers classical atomic hypotheses, in ontology_identification.html?lens=4657963068455733951

1 seems a bit odd. You could argue that the Argument from Mind Design Space Width supports it, but this just demonstrates that this initial argument may be too crude to do more than act as an intuition pump. By the time we're talking about the Argument from Reflective Stability, I don't think that argument supports "you can have circular preferences" any more. It's also not clear to me why 1 matters - all the arguments I know about, that depend on Orthogonality, still go through if we restrict ourselves to only agents with noncircular preferences. A friendly one should still exist, a paperclip maximizer should still exist.

Anton Geraschenko

1 seems a bit odd. You could argue that the Argument from Mind Design Space Width supports it, but this just demonstrates that this initial argument may be too crude to do more than act as an intuition pump. By the time we're talking about the Argument from Reflective Stability, I don't think that argument supports "you can have circular preferences" any more.

That's exactly the point (except I'm not sure what you mean by "the Argument from Reflective Stability"; the capital letters suggest you're talking about something very specific). The arguments in favor of Orthogonality just seem like crude intuition pumps. The purpose of 1 was not to actually talk about circular preferences, but to pick an example of something supported by largeness of mind design space, but which we expect to break for some other reason. Orthogonality feels like claiming the existence of an integer with two distinct prime factorizations because "there are so many integers". Like the integers, mind design space is vast, but not arbitrary. It seems unlikely to me that there cannot be theorems showing that sufficiently high cognitive power implies some restriction on goals.

Eliezer Yudkowsky

There's 6 successively stronger arguments listed under "Arguments" in the current version of the page. Mind design space largeness and Humean freedom of preference are #1 and #2. By the time we get to the Gandhi stability argument #3, and the higher tiers of argument above (especially including the tiling agents that seem to directly show stability of arbitrary goals), we're outside the domain of arguments that could specialize equally well to supporting circular preferences. The reason for listing #1 and #2 as arguments anyway is not that they finish the argument, but that (a) before the later tiers of argument were developed #1 and #2 were strong intuition-pumps in the correct direction and (b) even if they might arguably prove too much if applied sloppily, they counteract other sloppy intuitions along the lines of "What does this strange new species 'AI' want?" or "But won't it be persuaded by…" Like, it's important to understand that even if it doesn't finish the argument, it is indeed the case that "All AIs have property P" has a lot of chances to be wrong and "At least one AI has property P" has a lot of chances to be right. It doesn't finish the story - if we took it as finishing the story, we'd be proving much too much, like circular preferences - but it pushes the story in a long way in a particular direction compared to coming in with a prior frame of mind about "What will AIs want? Hm, paperclips doesn't sound right, I bet they want mostly to be left alone."

Anton Geraschenko

Thanks for the reply. I agree that strong Inevitability is unreasonable, and I understand the function of #1 and #2 in disrupting a prior frame of mind which assumes strong Inevitability, but that's not the only alternative to Orthogonality. I'm surprised that the arguments are considered successively stronger arguments in favor of Orthogonality, since #6 basically says "under reasonable hypotheses, Orthogonality may well be false." (I admit that's a skewed reading, but I don't know what the referenced ongoing work looks like, so I'm skipping that bit for now. [Edit: is this "tiling agents"? I'm not familiar with that work, but I can go learn about it.])

The other arguments are interesting commentary, but don't argue that Orthogonality is true for agents we ought to care about.

Gandhian stability argues that self-modifying agents will try to preserve their preference systems, but not that they can become arbitrarily powerful while doing so. As it happens, circular preference systems illustrate how Gandhian stability could limit how powerful a cognitive agent can become.
The unbounded agents argument says Orthogonality is true when "mind space" is broader than what we care about.
The search tractability argument looks like a statement about the relative difficulty of accomplishing different goals, not the relative difficulties of holding those goals. I don't mean to dismiss the argument, but I don't understand it. I'm not even clear on exactly what the argument is saying about the tractability of searching for strategies for different goals. That it's the same for all possible goals?