Epistemic and instrumental efficiency


by Eliezer Yudkowsky Jun 9 2015 updated Jun 16 2016

An efficient agent never makes a mistake you can predict. You can never successfully predict a directional bias in its estimates.

[summary: An agent that is "efficient", relative to you, within a domain, never makes a real error that you can predict.

For example: A superintelligence might not be able to count the exact number of atoms in a star. But you shouldn't be able to say, "I think it will overestimate the number of atoms by 10%, because hydrogen atoms are so light." It knows that too. For you to foresee a predictable directional error in a superintelligence's estimates is at least as impossible as you predicting a 10% rise in Microsoft's stock, over the next week, using only public information.

An agent that is "efficient", relative to you, within a domain, is one that never makes a real error that you can systematically predict in advance.

If an agent is epistemically and instrumentally efficient relative to all of humanity across all domains, we can just say that it is "efficient" (and almost surely superintelligent).

Epistemic efficiency

A superintelligence cannot be assumed to know the exact number of hydrogen atoms in a star; but we should not find ourselves believing that we ourselves can predict in advance that a superintelligence will overestimate the number of hydrogen atoms by a factor of 10%. Any thought process we can use to predict this overestimate should also be accessible to the superintelligence, and it can apply the same corrective factor itself.

The main analogy from present human experience would be the Efficient Markets Hypothesis as applied to short-term asset prices in highly-traded markets. Anyone who thinks they have a reliable, repeatable ability to predict 10% changes in the price of S&P 500 companies over one-month time periods is mistaken. If someone has a story to tell about how the economy works that requires advance-predictable 10% changes in the asset prices of highly liquid markets, we infer that the story is wrong. There can be sharp corrections in stock prices (the markets can be 'wrong'), but not humans who can reliably predict those corrections (over one-month timescales). If e.g. somebody is consistently making money by selling options using some straightforward-seeming strategy, we suspect that such options will sometimes blow up and lose all the money gained ("picking up pennies in front of a steamroller").

An 'efficient agent' is epistemically strong enough that we apply at least the degree of skepticism to a human proposing to outdo their estimates that, e.g., an experienced proponent of the Efficient Markets Hypothesis would apply to your uncle boasting about how he made a lot of money by predicting how General Motors's stock would rise.

Epistemic efficiency implicitly requires that an advanced agent can always learn a model of the world at least as predictively accurate as used by any human or human institution. If our hypothesis space were usefully wider than that of an advanced agent, such that the truth sometimes lay in our hypothesis space while being outside the agent's hypothesis space, then we would be able to produce better predictions than the agent.

Instrumental efficiency

This is the analogue of epistemic advancement for instrumental strategizing: By definition, humans cannot expect to imagine an improved strategy compared to an efficient agent's selected strategy (relative to the agent's preferences, and given the options the agent has available).

If someone argues that a cognitively advanced paperclip maximizer would do X yielding M expected paperclips, and we can think of an alternative strategy Y that yields N expected paperclips, N > M, then while we cannot be confident that a PaperclipMaximizer will use strategy Y, we strongly predict that:

…where to avoid [ privileging the hypothesis] or [ fighting a rearguard action] we should usually just say, "No, a Paperclip Maximizer wouldn't do X because Y would produce more paperclips." In saying this, we're implicitly making an appeal to a version of instrumental efficiency; we're supposing the Paperclip Maximizer isn't stupid enough to miss something that seems obvious to a human thinking about the problem for five minutes.

Instrumental efficiency implicitly requires that the agent is always able to conceptualize any useful strategy that humans can conceptualize; it must be able to search at least as wide a space of possible strategies as humans could.

Instrumentally efficient agents are presently unknown

From the standpoint of present human experience, instrumentally efficient agents are unknown outside of very limited domains. There are perfect tic-tac-toe players; but even modern chess-playing programs, with ability far in advance of any human player, are not yet so advanced that every move that looks to us like a mistake must therefore be secretly clever. We don't dismiss out of hand the notion that a human has thought of a better move than the chess-playing algorithm, the way we dismiss out of hand a supposed secret to the stock market that predicts 10% price changes of S&P 500 companies using public information.

There is no analogue of 'instrumental efficiency' in asset markets, since market prices do not directly select among strategic options. Nobody has yet formulated a use of the EMH such that we could spend a hundred million dollars to guarantee liquidity, and get a well-traded asset market to directly design a liquid fluoride thorium nuclear plant, such that if anyone said before the start of trading, "Here is a design X that achieves expected value M", we would feel confident that either the asset market's final selected design would achieve at least expected value M or that the original assertion about X's expected value was wrong.

By restricting the meaning even further, we get a valid metaphor in chess: an ordinary person such as you, if you're not an International Grandmaster with hours to think about the game, should regard a modern chess program as instrumentally efficient relative to you. The chess program will not make any mistake that you can understand as a mistake. You should expect the reason why the chess program moves anywhere to be only understandable as 'because that move had the greatest probability of winning the game' and not in any other terms like 'it likes to move its pawn'. If you see the chess program move somewhere unexpected, you conclude that it is about to do exceptionally well or that the move you expected was surprisingly bad. There's no way for you to find any better path to the chess program's goals by thinking about the board yourself. An instrumentally efficient agent would have this property for humans in general and the real world in general, not just you and a chess game.

Corporations are not superintelligences

For any reasonable attempt to define a corporation's utility function (e.g. discounted future cash flows), it is not the case that we can confidently dismiss any assertion by a human that a corporation could achieve 10% more utility under its utility function by doing something differently. It is common for a corporation's stock price to rise immediately after it fires a CEO or renounces some other mistake that many market actors knew was a mistake but had been going on for years - the market actors are not able to make a profit on correcting that error, so the error persists.

Standard economic theory does not predict that any currently known economic actor will be instrumentally efficient under any particular utility function, including corporations. If it did, we could maximize any other strategic problem if we could make that actor's utility function conditional on it, e.g., reliably obtain the best humanly imaginable nuclear plant design by paying a corporation for it via a sufficiently well-designed contract.

We have sometimes seen people trying to label corporations as superintelligences, with the implication that corporations are the real threat and equally severe, as threats, compared to machine superintelligences. But epistemic or instrumental decision-making efficiency of individual corporations is just not predicted by standard economic theory. Most corporations do not even use internal prediction markets, or try to run conditional stock-price markets to select among known courses of action. Standard economic history includes many accounts of corporations making 'obvious mistakes' and these accounts are not questioned in the way that e.g. a persistent large predictable error in short-run asset prices would be questioned.

Since corporations are not instrumentally efficient (or epistemically efficient), they are not superintelligences.


Paul Christiano

Regarding corporations:

I have seen very few arguments about superintelligence that rest on epistemic efficiency. Roughly speaking, epistemic efficiency with respect to X might be interpreted as "smarter in every way than X." But we usually talk about systems that are "smarter in some ways than humans." And the safety problem doesn't seem to change in a qualitative way at the threshold of "smarter in every way." Nor does economic value, or most indicators of interest. The only measure on which that's a fundamental threshold is "how hard it is for humans to do anything useful" (but this is not an indicator people are talking about if they talk about corporations, for obvious reasons…)

So while I might agree that a corporation is not a superintelligence on Nick's definition, this doesn't seem to have much bearing on the way in which the analogy is invoked in discussions. In general, this notion of superintelligence is a sufficient condition for lots of interesting phenomena, but not a necessary condition for almost anything.

It just seems like you semantically disagreeing about the use of the word "superhuman." This seems like a missed opportunity to help communicate about the value alignment problem. (Also, though it's not precisely related, I think that people really do think about AI better when they think about it as "idealized corporation" rather than "idealized human." Corporatization seems to be a better baseline than anthropomorphization, though neither is great.)

To make things more concrete, I think it is roughly as reasonable to ask about the "value alignment problem for organizations," and that many solutions to the value alignment problem for AI will also be applicable to organizations (and conversely, if someone came to me with an actually good proposal for value alignment for organizations, I would consider it worth-looking-at). Of course I think that value alignment problem for AI systems is much more important, and so where the two problems are disanalogous I care about the AI version (and also I want to reserve undecorated "value alignment" as a technical term referring to the AI version of the problem). But that judgment is unrelated to the epistemic efficiency of corporations---I'd think the same thing even if corporations were epistemically efficient, and I presume so would you.

Your basic complaint with people's use of corporations as an analogy really seems to be that AI systems will become very much more powerful, and that they will never have the same peculiar mix of abilities as human organizations.

(Indeed, assuming that an agent is smarter in every way simply makes the safety problem easier, and many of our disagreements about safety are based on me being willing to assume something like "epistemic efficiency with respect to an average college graduate," at least as a first step.)

Kenzi Amodei

Boundedly rational ?means rational even when you don't have infinite computing power? Naturalistic ?refers to naturalized induction, where you're not a cartesian dualist who thinks your processes can't be messed with by stuff in the world and also you're not just thinking of yourself as a little black dot in the middle of Conway's game of life? Google says economic agent means one who has an impact on the economy by buying, selling or trading; I assign 65% to that being roughly the meaning in use here?

Somehow the epistemic efficiency thing reminds me of the halting problem; that whatever we try and do, it can just do it more. Or… somehow it actually reminds me more the other way, that it's solved the halting problem on us. Apologies for abuse of technical terms.

So an epistemically efficient agent, for example, is already overcoming all the pitfalls you see in movies of "not being able to understand the human drive for self sacrifice" or love, or etc.

Is there an analogue of efficient markets for instrumental efficiency? Some sort of master-strategy-outputting process that exists (or maybe plausibly exists in at least some special cases) in our world? Maybe Deep Blue at chess, I guess? Google maps for driving directions (for the most part)? reads to next paragraph. Well; not sure whether to update against Google Maps being an example from the fact that it's not mentioned in "instrumentally efficient agents are presently unknown" section

That said, "outside very limited domains" - well, I guess "the whole stock market, mostly" is a fair bit broader than "chess" or even "driving directions". Ah, I see; so though chess programs are overall better than humans, they're not hitting the "every silly-looking move is secretly brilliant" bar yet. Oh, and that's definitely not true of google maps - if it looks like it's making you do something stupid, you should have like 40% that it's in fact being stupid. Got it.

I can't tell if I should also be trying to think about whether there's a reasonable de

Kenzi Amodei

Had a very visceral experience of feeling surrounded by a bunch of epistemically efficient (wrt me) agents in a markets game tonight. Just like "yup, I can choose to bet, or not bet, and if I do bet, I may even make money, because the market may well be wrong, but I will definitely, definitely lose money in expectation if I bet at all"

Ryan Carey

You could argue that a superintelligence would be efficient at all tasks as follows:

Assume that:

  1. An AI will not knowingly be biased (if it knew it had a bias, it would correct it).
  2. predicting the residual error of one's predictions is a task that superintelligences are definitionally better at than humans.

Then: superintelligences are efficient at all tasks.

The proof is by contradiction. Suppose a superintelligence has some residual error in some task that humans can predict. Then, by (2), the superintelligence can also predict that residual error. But (1) asserts that a superintelligence cannot know that it is biased, so a superintelligence cannot be efficient at any task.