Vingean uncertainty

by Eliezer Yudkowsky Jul 1 2015 updated Jun 21 2016

You can't predict the exact actions of an agent smarter than you - so is there anything you _can_ say about them?

[summary: Vinge's Principle says that you (usually) can't predict exactly what an entity smarter than you will do, because if you knew exactly what a smart agent would do, you would be at least that smart yourself. If you can predict exactly what move Deep Blue will make on a chessboard, you can play chess as well as Deep Blue by moving to the same place you predict Deep Blue would.

This doesn't mean Deep Blue's programmers were ignorant of all aspects of their creation. They understood where Deep Blue was working to steer the board's future - that Deep Blue was trying to win (rather than lose) chess games.

"Vingean uncertainty" is the epistemic state we enter into when we consider an agent too smart for us to predict its exact actions. In particular, we will probably become more confident of the agent achieving its goals - that is, become more confident of which final outcomes will result from the agent's actions - even as we become less confident of which exact actions the agent will take.]

Of course, I never wrote the “important” story, the sequel about the first amplified human. Once I tried something similar. John Campbell’s letter of rejection began: “Sorry—you can’t write this story. Neither can anyone else.”… “Bookworm, Run!” and its lesson were important to me. Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It’s a problem writers face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity—a place where extrapolation breaks down and new models must be applied—and the world will pass beyond our understanding. -- Vernor Vinge, True Names and other Dangers, p. 47.

Vingean unpredictability is a key part of how we think about a consequentialist intelligence which we believe is smarter than us in a domain. In particular, we usually think we can't predict exactly what a smarter-than-us agent will do, because if we could predict that, we would be that smart ourselves (Vinge's Principle).

If you could predict exactly what action Deep Blue would take on a chessboard, you could play as well as Deep Blue by making whatever move you predicted Deep Blue would make. It follows that Deep Blue's programmers necessarily sacrificed their ability to intuit Deep Blue's exact moves in advance, in the course of creating a superhuman chessplayer.

But this doesn't mean Deep Blue's programmers were confused about the criterion by which Deep Blue chose actions. Deep Blue's programmers still knew in advance that Deep Blue would try to win rather than lose chess games. They knew that Deep Blue would try to steer the chess board's future into a particular region that was high in Deep Blue's preference ordering over chess positions. We can predict the consequences of Deep Blue's moves better than we can predict the moves themselves.

"Vingean uncertainty" is the peculiar epistemic state we enter when we're considering sufficiently intelligent programs; in particular, we become less confident that we can predict their exact actions, and more confident of the final outcome of those actions.

(Note that this rejects the claim that we are epistemically helpless and can know nothing about beings smarter than ourselves.)

Furthermore, our ability to think about agents smarter than ourselves is not limited to knowing a particular goal and predicting its achievement. If we found a giant alien machine that seemed very well-designed, we might be able to infer the aliens were superhumanly intelligent even if we didn't know the aliens' ultimate goals. If we saw metal pipes, we could guess that the pipes represented some stable, optimal mechanical solution which was made out of hard metal so as to retain its shape. If we saw superconducting cables, we could guess that this was a way of efficiently transporting electrical work from one place to another, even if we didn't know what final purpose the electricity was being used for. This is the idea behind Instrumental convergence: if we can recognize that an alien machine is efficiently harvesting and distributing energy, we might recognize it as an intelligently designed artifact in the service of some goal even if we don't know the goal.

Noncontainment of belief within the action probabilities

When reasoning under Vingean uncertainty, due to our [ lack of logical omniscience], our beliefs about the consequences of the agent's actions are not fully contained in our probability distribution over the agent's actions.

Suppose that on each turn of a chess game playing against Deep Blue, I ask you to put a probability distribution on Deep Blue's possible chess moves. If you are a rational agent you should be able to put a well-calibrated probability distribution on these moves - most trivially, by assigning every legal move an equal probability (if Deep Blue has 20 legal moves, and you assign each move 5% probability, you are guaranteed to be well-calibrated).

Now imagine a randomized game player RandomBlue that, on each round, draws randomly from the probability distribution you'd assign to Deep Blue's move from the same chess position. In every turn, your belief about where you'll observe RandomBlue move, is equivalent to your belief about where you'd see Deep Blue move. But your belief about the probable end of the game is very different. (This is only possible due to your lack of logical omniscience - you lack the computing resources to map out the complete sequence of expected moves, from your beliefs about each position.)

In particular, we could draw the following contrast between your reasoning about Deep Blue and your reasoning about RandomBlue:

This reflects our belief in something like the instrumental efficiency of Deep Blue. When we estimate the probability that Deep Blue makes a move $~$x$~$, we're estimating the probability that, as Deep Blue estimated each move $~$y$~$'s expected probability of winning $~$EU[y]$~$, Deep Blue found $~$\forall y \neq x: EU[x] > EU[y]$~$ (neglecting the possibility of exact ties, which is unlikely with deep searches and floating-point position-value estimates). If Deep Blue picks $~$z$~$ instead of $~$x$~$, we know that Deep Blue estimated $~$\forall y \neq z: EU[z] > EU[y]$~$ and in particular that Deep Blue estimated $~$EU[z] > EU[x]$~$. This could be because the expected worth of $~$x$~$ to Deep Blue was less than expected, but for low-probability move $~$z$~$ to be better than all other moves as well implies that $~$z$~$ had an unexpectedly high value relative to our own estimates. Thus, when Deep Blue makes a very unexpected move, we mostly expect that Deep Blue saw an unexpectedly good move that was better than what we thought was the best available move.

In contrast, when RandomBlue makes an unexpected move, we think the random number generator happened to land on a move that we justly assigned low worth, and hence we expect to defeat RandomBlue faster than we otherwise would have.

Features of Vingean reasoning

Some interesting features of reasoning under Vingean uncertainty:

Our expectation of Vingean unpredictability in a domain may break down if the domain is extremely simple and sufficiently closed. In this case there may be an optimal play that we already know, making superhuman (unpredictable) play impossible.

Cognitive uncontainability

Vingean unpredictability is one of the core reasons to expect cognitive uncontainability in sufficiently intelligent agents.

Vingean reflection

Vingean reflection is reasoning about cognitive systems, especially cognitive systems very similar to yourself (including your actual self), under the constraint that you can't predict the exact future outputs. Deep Blue's programmers, by reasoning about the way Deep Blue was searching through game trees, could arrive at a well-justified but abstract belief that Deep Blue was 'trying to win' (rather than trying to lose) and reasoning effectively to that end.

In Vingean reflection we need to make predictions about the consequence of operating an agent in an environment, without knowing the agent's exact future actions - presumably via reasoning on some more abstract level, somehow. In Tiling agents theory, Vinge's Principle appears in the rule that we should talk about our successor's specific actions only inside of quantifiers.

"Vingean reflection" may be a much more general issue in the design of advanced cognitive systems than it might appear at first glance. An agent reasoning about the consequences of its current code, or considering what will happen if it spends another minute thinking, can be viewed as doing Vingean reflection. Vingean reflection can also be seen as the study of how a given agent wants thinking to occur in cognitive computations, which may be importantly different from how the agent currently thinks. (If these two coincide, we say the agent is reflectively stable.)

Tiling agents theory is presently the main line of research trying to (slowly) get started on formalizing Vingean reflection and reflective stability.