Löbstacle

Imagine you have an artificial intelligence $D1$ who uses a logical [ knowledge base] and a certain system of [ logic] to guarantee that its actions always provably have the best outcome possible.

In other words, $D1$ has a bunch of [ axioms] and some [ deduction rules], and it can relate the logical terms to the real world. When it takes an action, it first checks that if it formalizes the action as a logical formula then the action [ entails] a good outcome according to the inference rules it is using and the axioms in his knowledge base.

Sidestepping complexity issues%%note: Deducing logical consequences in reasonable logical systems is not cheap!%%, such a setup is really desirable, as opposed to a black box artificial intelligences, because it allows us to better understand its reasoning process and design formal constraints that the AI has to satisfy.

However, we also want the AI to become smarter and able to come up with better conclusions. For that it may require to add new axioms to its knowledge base, so that it can prove more things. Let's call the reasoner which results of adding those axioms $D2$ .

On the other hand, it has to be careful that it does not add any axiom in [inconsistency contradiction] with the old axioms, because then the new [ knowledge base] would entail every logical statement!

Therefore, before enacting the modification, $D1$ has to prove that $D2$ is [-consistent].

But since $D2$ is stronger than $D1$ (everything $D1$ proves is also proved by $D2$ , since every axiom in $D1$ is also in $D2$ ) then the consistency of $D2$ implies the consistency of $D1$ , and if $D1$ extends minimal arithmetic then it can prove that. %%note: This is most certainly true, but it is formalizable? Turns out it is as long as $D1$ includes Peano Arithmetic. A brief and hand wavy argument that shows this is acknowledging that $\square_{D1}A\rightarrow \square_{D2}A$ [todo: complete argument]%%

Therefore, $D1$ cannot show that $D2$ is consistent, because otherwise $D1$ would prove that he himself is consistent, and if $D1$ was actually consistent then that is not allowed by [godel_second_incompleteness_theorem]%%note: Or, equivalently, by Löb's theorem, which is where the catchy name Löbstacle comes from%%.

This means that our idealized AI will only build successors which are strictly weaker than himself, which is clearly undesirable. Extended discussions of the Löbstacle can be found here and here.