Vinge's Principle

https://arbital.com/p/Vinge_principle

by Eliezer Yudkowsky Dec 18 2015 updated Jun 25 2016

An agent building another agent must usually approve its design without knowing the agent's exact policy choices.


[summary: Vinge's Principle says that, in domains complicated enough that perfect play is not possible, less intelligent agents will not be able to predict the exact moves made by more intelligent agents.

For example, if you knew exactly where Deep Blue would play on a chessboard, you'd be able to play chess at least as well as Deep Blue by making whatever moves you predicted Deep Blue would make. So if you want to write an algorithm that plays superhuman chess, you necessarily sacrifice your own ability to (without machine aid) predict its exact chess moves.

This doesn't mean we sacrifice our ability to understand anything about Deep Blue. We can still understand that its goal is to win chess games rather than losing them, and predict that whatever actions it takes, they'll eventually lead into a winning board state.]

Vinge's Principle says that, in domains complicated enough that perfect play is not possible, less intelligent agents will not be able to predict the exact moves made by more intelligent agents.

For example, if you knew exactly where Deep Blue would play on a chessboard, you'd be able to play chess at least as well as Deep Blue by making whatever moves you predicted Deep Blue would make. So if you want to write an algorithm that plays superhuman chess, you necessarily sacrifice your own ability to (without machine aid) predict the algorithm's exact chess moves.

This is true even though, as we become more confident of a chess algorithm's power, we become more confident that it will eventually win the chess game. We become more sure of the game's final outcome, even as we become less sure of the chess algorithm's next move. This is Vingean uncertainty.

Now consider agents that build other agents (or build their own successors, or modify their own code). Vinge's Principle implies that the choice to approve the successor agent's design must be made without knowing the successor's exact sensory information, exact internal state, or exact motor outputs. In the theory of tiling agents, this appears as the principle that the successor's sensory information, cognitive state, and action outputs should only appear inside quantifiers. This is Vingean reflection.

For the rule about fictional characters not being smarter than the author, see Vinge's Law.


Comments

Micah Carroll

For example, if you knew exactly where Deep Blue would play on a chessboard, you'd be able to play chess at least as well as Deep Blue by making whatever moves you predicted Deep Blue would make\. So if you want to write an algorithm that plays superhuman chess, you necessarily sacrifice your own ability to \(without machine aid\) predict the algorithm's exact chess moves\.

Technically, couldn't we run by hand on a piece of paper all the computations that Deep Blue goes through, and this way "predict the algorithm's exact chess moves"? In a way intuitively I feel like it's wrong to say that Deep Blue is "better than" us at playing chess, or AlphaGo is "better than" us at playing go. I feel like it depends on how we define "better", or in general "intelligence" and/or "skill" – if it is related to a notion of efficiency vs to one of speed. Because in terms of pure "competency", it seems like whatever a computer can do, we can do it too, although much slower – by just executing each line one step at a time.

As far as I can tell, current AI systems can just explore the search space of possible moves faster than us. They aren't necessarily as efficient as us – arguably AI systems are still very sample-inefficient (i.e. AlphaGo trained on many more games than any human would be able to play in his lifetime).

Clearly though running through all the computations by hand would take an unfeasible amount of time. Not sure if this is just a minor philosophical point or an actual thing one should care about. I'm still learning more about the field, wouldn't be surprised if someone already talked about this difference between speed and efficiency in defining intelligence but I just haven't found it yet.

I guess what I'm trying to say is that I agree with the premise that "less intelligent agents will not be able to predict the exact moves made by more intelligent agents", but I'm somehow not convinced that DeepBlue or AlphaGo are "more intelligent" than us – depending on the definition of intelligence we use. And under definitions for which they are more intelligent than us, then I don't agree Vinge's Principle applies for them unless there are time constraints.

[The exact phrase that Deep Blue is "better than us at playing chess" – that prompted my comment – is actually mentioned in this page under the "Compatibility with Vingean uncertainty" paragraph]