"(This is hard without threa..."


by Paul Christiano Dec 29 2015 updated Dec 31 2015

(This is hard without threaded conversations. Responding to the "agree/disagree" from Eliezer)

The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'You can't have superintelligences that optimize any external factor, only things analogous to internal reinforcement.'

The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'The problem of reflective stability is unsolvable in the limit and no efficient optimizer with a unitary goal can be computationally large or self-improving.'

I think there are a lot of plausible failure modes. The two failures you outline don't seem meaningfully distinct given our current understanding, and seem to roughly describe what I'm imagining. Possible examples:

Paul is worried about something else / Eliezer has completely missed Paul's point.

I do think the more general point, of "we really don't know what's going on here," is probably more important than the particular possible counterexamples. Even if I had no plausible counterexamples in mind, I just wouldn't especially confident.

I think the only robust argument in favor is that unbounded agents are probably orthogonal. But (1) that doesn't speak to efficiency, and (2) even that is a bit dicey, so I wouldn't go for 99% even on the weaker form of orthogonality that neglects efficiency.

If you can get to 95% cognitive efficiency and 100% technological efficiency, then a human value optimizer ought to not be at an intergalactic-colonization disadvantage or a take-over-the-world-in-an-intelligence-explosion disadvantage and not even very much of a slow-takeoff disadvantage.

It sounds regrettable but certainly not catastrophic. Here is how I would think about this kind of thing (it's not something I've thought about quantitatively much, it doesn't seem particularly action-relevant).

We might think that the speed of development or productivity of projects varies a lot randomly. So in the "race to take over the world" model (which I think is the best case for an inefficient project maximizing its share of the future), we'd want to think about what kind of probabilistic disadvantage a small productivity gap introduces.

As a simple toy model, you can imagine two projects; the one that does better will take over the world.

If you thought that productivity was log normal with a standard deviation of */ 2, then a 5% productivity disadvantage corresponds to maybe a 48% chance of being more productive. Over the course of more time the disadvantage becomes more pronounced if randomness averages out. If productivity variation is larger or smaller then it decreases or increases the impact of an efficiency loss. If there are more participants, then the impact of a productivity hit becomes significantly large. If the good guys only have a small probability of losing, then the cost is proportionally lower. And so on.

Combining with my other views, maybe one is looking at a cost of tenths of a percent. You would presumably hope to avoid this by having the world coordinate even a tiny bit (I thought about this a bit here). Overall I'll stick with regrettable but far from catastrophic.

(My bigger issue in practice with efficiency losses is similar to your view that people ought to have really high confidence. I think it is easy to make sloppy arguments that one approach to AI is 10% as effective as another, when in fact it is 0.0001% as effective, and that holding yourself to asymptotic equivalence is a more productive standard unless it turns out to be unrealizable.)