So, this page is meant to be read in tandem with the entirety of MIRI's logical induction paper, and will cover the whole thing in detail. This page will go into more detail than the paper, and recurse into a lot of assumed math background, so it's main use is in clearing up points of uncertainty, and can be skimmed. If you haven't pulled up the .pdf yet, it is available right here.

## Abstract

**"We present a computable algorithm that assigns probabilities to every logical
statement in a given formal language, and refines those probabilities over time."**

This is a pretty big deal. We’re used to logic being a yes/no thing. Logical statements may eventually get proven or disproven. Until that point though, they’re in a vague “we don’t really know” state. However, mathematicians have intuitions that certain statements are likely to be true or not true - Computer scientists are pretty sure that P ≠ NP even though they currently have no idea how to prove something like that.

Until now we haven’t been able to definitively say that the idea of a logical statement being true with some probability is meaningful, because probabilistic Bayesian agents are supposed to have a kind of logical omniscience, where they can immediately figure out all consequences of whatever they believe, which obviously isn’t how human mathematicians work. Even in more exotic logics, like Kleene's Indeterminacy Logic, they don't assign probabilities to statements, only yes/no/unknown.

What this does not say is: This algorithm would be practical to use. Computable simply means that it’s an algorithm that follows nice mechanical steps that a computer could take. But it’s a hypothetical computer with unbounded RAM and CPU time, not your laptop. %note: Unbounded doesn’t exactly mean infinite - it means there’s no fixed limit it has to work within. Often we do imagine a Turing machine running forever, producing some infinite stream of output with its memory requirements constantly growing and growing. But at any point in time the entire structure will be finite.%

Before we get onto any more content from the paper, let’s have a little think about what logical uncertainty means and why it’s different from empirical uncertainty, and why it’s important.

[todo: write philosophical stuff about it here]

Here’s what they claim their algorithm can do.

**(1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time.**

"Pattern" isn't defined yet. The “long before having the resources to evaluate the statements” part is meaty though, because this implies the algorithm isn't trivial.

You could just take the algorithm that iterates through every possible proof of everything, assign the statements that are proven a probability of 1 and those that are disproven a probability of 0, and everything else the probability ½. That would technically satisfy the claim “We present a computable algorithm that assigns probabilities to every logical statement” but be useless since we’ve known how to do that for as long as we’ve had a concept of computability.

Basically it’s saying this algorithm can run in a resource-constrained %note: But immensely resource-rich.% environment, and get meaningful probabilities on computations long before it’s realistic to ask it for the actual answer.

We also need to figure out what a "meaningful probability" means here. We need some technical definition of it that somehow doesn’t let through the “everything unknown is ½” example, and at the same time doesn’t require our algorithm to have magick halting oracle powers.

**(2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom.**

Imagine directing your attention towards a largish number like 67,867,967. It’s not even or divisible by 5, which is a good start. One by one you can try dividing it by other small primes to rule those out. But at the point where you get bored of doing that, and still haven’t established that it’s prime yet, you can just give the probability that it would be given how many other numbers of that size tend to be prime (once you’ve updated on the fact that it isn't divisible by 2, 3, 5, etc.)

In the case of primes there’s some number theory results that can help you pin down the probability some more, or get you to a definite answer more quickly. But we’re imagining you don’t have the resources to do that - you just have the ability to divide one number by another and have quickly googled to check that the probability something is prime is roughly 1/log(N).

Other parts of math are even more boring than primes and don’t have any Zeta functions or cosmic harmonies or whatever and just output (seemingly) random noise, such as predicting the output of an actual pseudorandom number generator after some huge number of iterations. Another example is learning that Zipf’s law applies to predicting the first digit of some hard to compute number.

**(3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference.**

This is a big topic covering lots of interesting results.stuff It has to do with Gödel's first incompleteness theorem and Löb's theorem though.

“You can’t prove this statement”. Well, if you can prove it then you’ve just proved a false thing, which you’re not supposed to be able to do in mathematics. If you can’t prove it then it's a true thing (according to second-order logic, at least) that’s left unprovable, which is probably what’s actually happening.

That’s basically what Gödel’s First Incompleteness Theorem says. But the actual substance to it is more complex, so stop by Gödel's first incompleteness theorem if you aren't very comfortable with it. This, this, and this are also useful resources on the Incompleteness Theorems.

There's a close relative, Tarski's Undefinability Theorem, which considers the statement “This statement is false”, which would be a problem not just for provability but for the very fabric of logic itself, because it can be neither true nor false. The conclusion is then that no sufficiently-powerful system of formal logic can encode its own truth predicate. We can only say it in English because English is a bad language for doing formal logic in.

The one about provability can be encoded though because there are clever ways of encoding the proof itself in whatever object-level items your logic is talking about, such as numbers or sets. So if you can express your statements as numbers and the validity of a proof as a complex arithmetic expression then you can construct a statement equivalent to the negation of its own provability without violating the object/meta boundary.

It should be noted that the system presented here is *not* a special theorem prover that thinks in probabilities. It’s actually a system that sits downstream of a theorem prover, turning the unproven statements into probabilities. Statements that have been proven true or false will obviously get a probability of 1 or 0 respectively.

Some of the nice properties rely on the fact that the upstream theorem prover is sane: it will eventually produce all the logical consequences of all of the axioms, i.e. to prove everything that it’s possible to prove in the system.

**(3) Logical inductors also learn to trust their future beliefs more than their current beliefs**

This statement sounds a little weird at first. Our system doesn’t know what its future beliefs are going to be, so how can it trust them? It boils down to the fact that you can do math on variables that you don’t yet know the value of. An example of this might be:

$$~$\mathbb{E}_{now}(X) = \mathbb{E}_{now}(\mathbb{E}_{future}(X))$~$$

X is a LUV (logically uncertain variable), something which it knows is supposed to be a number but doesn’t know the value of. It corresponds to a mathematical expression meaning it actually has a fixed value (unlike random variables in probability theory which are actually just supposed to be random). But it doesn’t know what the value is yet, so $~$\mathbb{E}_{now}(X)$~$ refers to the expected value. $~$\mathbb{E}_{future}(X)$~$ is also a LUV because it denotes a single numerical value that the algorithm doesn't know yet, so the algorithm can take the expectation of it.

This statement isn’t the only thing suggested by “Logical inductors also learn to trust their future beliefs more than their current beliefs” but it’s intended to give an idea of what it’s talking about.

[todo: Alex was thinking of this in terms of “future me won't have beliefs that are predictably stupid according to current me". Giles was thinking in terms of "current me isn't predictably stupid according to my model of future me", specifically “I know that future me is going to think current me is dumb, but I don't know in which direction for any specific belief”. We’re not sure if this phrasing is valid.]

**their beliefs are coherent in the limit (whenever $~$\phi\rightarrow\psi$~$, $~$\mathbb{P}_{\infty}(\phi)\le\mathbb{P}_{\infty}(\psi)$~$ and so on).**

$~$\phi$~$ and $~$\psi$~$ are sentences in our logical grammar. (Or to be more precise, they are variables that act as placeholders for some sentence. Pointing this out here is pedantic, but with more complicated stuff it can become confusing what syntax is intended to be part of a sentence and what’s the metalanguage talking about those sentences).

$~$\rightarrow$~$ is logical implication, and $~$\phi\rightarrow\psi$~$ is the larger sentence that asserts that $~$\phi$~$ implies $~$\psi$~$.

$~$\mathbb{P}_{n}(\phi)$~$ is the probability that $~$\phi$~$ is true according to our algorithm on "day" n. The subscript is necessary because the algorithm’s opinions will change over time as it receives new information from its upstream theorem prover, and incorporates more "traders".

It lies in $~$[0,1]$~$ because otherwise it can’t meaningfully be called a probability, and $~$\mathbb{P}_{\infty}$~$ is the limiting value of this algorithm over time (which we haven't yet shown exists). This means that the probability of a sentence doesn't wiggle up and down forever without converging, it heads to a single well-defined value. The algorithm won’t actually ever necessarily believe $~$\phi$~$ with probability $~$\mathbb{P}_{\infty}(\phi)$~$, it might just keep getting closer and closer to it.

For statements that are provable, $~$\mathbb{P}_{\infty}=1$~$. The converse ( $~$\mathbb{P}_{\infty}=1$~$ implies that the statement is provable) is also true, but considerably harder to prove. It's boringly true if $~$\phi$~$ or $~$\psi$~$ are provably true or false, since we end up with degenerate statements involving 0 and 1 (check the truth table, replacing T with 1 and F with 0)

But we might provably have $~$\phi\rightarrow\psi$~$, and yet have neither $~$\phi$~$ nor $~$\psi$~$ being provably true or false. Think of the Axiom of Choice and various other results which follow from it: the Axiom of Choice can’t be proven given the other axioms of Zermelo–Fraenkel set theory, so if that’s our system then $~$\mathbb{P}_{\infty}(AOC)$~$ will lie strictly between 0 and 1. Anything implied by the Axiom of Choice will end up with a greater or equal probability (In the limit; this says “≤” not “<” so we can imagine the limiting probabilities being equal but the in-the-moment probabilities constantly crossing over until it gets proved). A corollary is that if the statements are provably equivalent then the probabilities will converge to the same thing.

**logical inductors strictly dominate the universal semimeasure in the limit**

*Definition: Universal Semimeasure*

[todo: stuff this in a different page]

Perhaps some vague intuition for the whole statement before diving into the details - the universal semimeasure is the best way to “get” probabilities like the ones we want, it's the thing that Solomonoff induction: Intro Dialogue (Math 2) does. (though we'll be going into more detail than that link) I say “get” because it can’t be computed, it’s a theoretical best. This means that Logical Inductors, if run with infinite resources (hence, in the limit), don't do much worse than this universal semimeasure, and in some cases do arbitrarily better!

But what is the universal semimeasure? Actually, what even is a semimeasure in the first place? Or a measure?

*Definition: Bitstring*

It’s a string of bits of finite length or infinite length.

In this context we are thinking of them as a sequence of observations about the world.

*Definition: Measure*

A measure is a way to assign a number to each subset of a set, and you could interpret it as “size”. Probability is a measure over events, volume is a measure over 3-d space. Some sets are not possible to measure at all under these systems, which can lead to some counterintuitive results such as the Banach-Tarski paradox (though this isn’t relevant here).

The requirements of a measure are as follows.

- The measure assigns a real number to each subset of some "universe".
- The measure is never negative
- The measure of the empty set is 0
- And $~$M(A\cup B)=M(A)+M(B)$~$ when A and B have no overlap.

Probabilities are a special case of a measure, where the measure of the entire "universe" has to equal 1.

How would we take our concept of probability and apply it to our concept of bitstrings?

You’re up against a box, and it spits out bits. Which bit will it spit out next? You can figure this out by taking your prior probability distribution (i.e. your measure), discarding all the parts of it that are inconsistent with the observations so far, normalizing so that it adds up to 1 again, and that tells you what to expect next. i.e. a Bayesian update. But this “updating” part isn’t part of the measure. The measure is the a priori distribution over bitstrings when you have no information about what to expect at all. You could think of the Universal Semimeasure as encoding all the information in Solomonoff Induction over all the possibilities, and Solomonoff Induction is just zooming in on the relevant part of the Universal Semimeasure and blowing it up.

In the context of bitstrings there are two ways we could define a measure. We could define it as a function that operates on sets of infinite bitstrings, or as a function that operates on individual, finite bitstrings, such that M(bitstring)=M(bitstring with a 0 on the end)+M(bitstring with a 1 on the end). To convert between the two, you could think of M(finite bitstring) as equivalent to the measure of the set of infinite bitstrings that start with that particular finite bitstring.

We’ll generally be using the second one.

What's important about this roundabout definition involving finite strings when it’s really the infinite ones that we care about?

Well… the problem of infinite bitstrings is there are just too many of them. Obviously there are infinitely many of them, but it’s worse than that - it’s an uncountable infinity. While you can sort of squeeze countable infinities into a usual probability distribution (e.g. by assigning ½, ¼, ⅛ etc.) you can’t do that with uncountable infinities. Instead the “probability” of each infinite bitstring is “zero” - but that doesn’t mean that outcome is impossible and there’s even a sense in which some kinds of outcome are more likely than others. So we need to do this thing of approaching them with ever-larger finite strings in order to make sense of that. It’s kind of like how, for a continuous probability distribution over $~$\mathbb{R}$~$, $~$\mathbb{P}(x)=0$~$, but there are intervals where you can get a positive finite value.

*Definition: Semimeasure*

But there's a problem. What if we also want to account for the fact that maybe the box isn’t going to spit out an infinite bitstring? Maybe it’ll just give us a finite one and then stop. Then M(bitstring)>M(bitstring and next bit is 0)+M(bitstring and next bit is 1). Well then, since we violated that equality condition, it’s a semimeasure over the set of infinite bitstrings. The only two differences between a semimeasure and measure are that the semimeasure may add up to less than 1, and M(bitstring) may be greater than the sum of the measures of the two "continuations" of the bitstring.

But really there’s nothing inconsistent or weird going on, if you think of this as a probability distribution over finite and infinite bitstrings, not as a faulty probability distribution over infinite bitstrings. This is discussed in (Hay 2007).

Solomonoff induction is all about Turing Machines. These are a mathematical abstraction of the idea of a computer: it performs a sequence of deterministic operations with unbounded memory and computing time, for some definition of “sequence of deterministic operations” that’s general enough to include anything that we’d usually describe that way. (i.e. the Church-Turing thesis).

*Definition: Turing Machine* [todo: merge with main page]

Turing Machines are often thought of as a finite state machine that operates on a tape, sliding the tape one way or another and replacing the symbol that’s currently under the head in a deterministic way. They are important because they can compute any computable function. They capture the essence of computation, in a sense. They don’t have to be defined in that particular way though, there are lots of equivalent formulations and often in these kind of discussions we gloss over how these machines are actually supposed to operate.

In particular, each machine is going to need to be assigned a number explaining the rules of how it operates, and there’s a sense in which we don’t care exactly which number gets assigned to which machine that does what, as long as it’s possible to do it systematically.

The basic idea is that the input bitstring is read (by some other “universal” turing machine) as an instruction set specifying a particular turing machine and how to emulate it. Every turing machine has a (finite) input bitstring that captures its behavior, longer bitstrings for more complicated machines. And it’s probably possible to set things up so every input bitstring describes *some* turing machine. Boom, there’s your mapping.

Let’s imagine some Turing machines.

These particular Turing Machines are defined to take no input. And they only have one output device: at a time of the Turing machine’s choosing, this device can spit out either a 0 or a 1 (normally turing machines go back and forth and modify their outputs, but we're assuming that these turing machines have a work tape and a write-only output).

You can see that sometimes it will output something and then stop, or stop before it got around to outputting anything at all. This is pretty normal behaviour for a TM. It’s may not be “stopping” as such - maybe it doesn't ever halt. But the work tape might have gotten stuck in some kind of infinite loop (which may be trivial or may be very complex) that prevented it from ever outputting anything more.

Other Turing Machines are a bit more cooperative and carry on spitting bits out forever. In the illustration above there’s one which outputs only 1’s, and one which outputs alternative 0’s and 1’s, forever.

As explained above, We’ve glossed over the detail of what’s supposed to be inside these machines. In reality, for any sensible encoding the first few would probably not produce anything very interesting so I’ve cheated a bit here. But eventually - after maybe a few hundred or so - you’ll start to come across machines that produce nontrivial sequences of bits and things start getting interesting again.

The extra column on the left called “probability” is not a property of the Turing machine itself. Instead it’s a label that we’ve somewhat artificially attached to it, saying how “likely” that Turing machine is to accurately generate some bitstring representing our sequence of observations.

It’s not completely arbitrary though. Through Occam’s razor, we expect that complex machines (e.g. ones that could only be represented through millions of lines of computer code) are less likely to describe our observation sequence than the really simple ones (made up of tens or hundreds of lines of code). At least a-priori we believe that - we may accumulate enough evidence to rule out all the really simple ones and just have to suck up the fact that the universe is complicated. But remember, this measure that we’re talking about is purely a-priori, it’s what we believed before we started updating on any actual evidence.

The vaguely-exponential looking bar graph represents this distribution, and intended to add up to exactly 1 even with a (countably) infinite number of Turing Machines to consider. We can be a little more precise in what we mean by that though.

If the description of your Turing machine is n bits long, its probability mass would be $~$\frac{1}{0.5822…}*n^{-2}*2^{-n}$~$. (there are alternate formulations involving prefix-free turing machines which make the equation simpler, but this'll get the idea across.) Why this? Well, the $~$2^{-n}$~$ piece means that as you go to longer and longer turing machine lengths, the exponential rise in possible turing machines is counterbalanced by an exponential decline in the probability of that particular turing machine. There are about 18 quintillion 64-bit turing machines, so that piece gives them 1/18 quintillion times the probability mass so they still add up to 1. Ok, we’ve got another problem (which doesn't exist with prefix-free encodings). If we assign equal probability to each “bin” of n-bit turing machines, then our probability distribution adds up to infinity. If we scale each “bin” down by $~$n^{-2}$~$ (summing that up from 1 to infinity yields a finite number), that’s good enough to get it to add up to a finite number. The weird constant at the front is just there to get it to add up to 1 (although it’s technically not needed, since semimeasures don’t have to add up to 1. We’re interpreting it as probability, so it’s nice to have, though, and doesn’t change any conclusions).

You can turn this distribution over Turing machines into a distribution over bitstrings, specifically a semimeasure over bitstrings, as we just discussed. How exactly?

*Definition: Universal Semimeasure*

Well, start off with some finite bitstring σ. Find (the infinite collection of) Turing machines that output something that starts with σ, including those that just output σ itself and then nothing more. Add up all their probabilities. That’s the measure you assign to σ. This is analogous to starting with a probability distribution over genotypes, and ending up with a probability distribution over phenotypes. The Turing machines that output finite strings and stop make this into a semimeasure instead of a full measure, and provide the "probability overhang" between a bitstring, and the sum of the probabilities for the bitstring with a 0 or 1 stuck at the end.

Something interesting is going on here which might not be immediately apparent. Semimeasures and measures are designed to deal with these uncountable sets where the probability of each particular item is zero. But this isn’t true of all of our infinite bitstrings. Some of them - the computable ones - actually do have a nonzero probability. The rest don’t. But Solomonoff Induction is still able to provide probabilities about what will happen next even if the observation sequence is uncomputable.

So that’s the universal semimeasure. %note: There's no such thing as "the" universal semimeasure, it depends on our choice of universal Turing Machine and encoding% The prior probability distribution over future observations/bitstrings that Solomonoff Induction, the theoretical gold standard for inductive reasoning, uses.

The logical inductor can be adapted to guess things about bitstrings. In some sense it can be seen as a “competitor” to the Universal Semimeasure and Solomonoff Induction.

The limit of a logical inductor (the semimeasure a logical inductor would have if it could think forever) dominates Solomonoff Induction, a theoretical gold standard for sequence prediction. Wait, Solomonoff Induction beats every computable predictor! How could it lose to a logical inductor? Well, we aren’t running the finite stages of the logical inductor against Solomonoff. We’re talking about the theoretical perfected final form of the logical inductor, given infinite training time.

[todo: insert meme about "this isn't even my final form]

This isn’t that big of an issue since the Universal Semimeasure is also uncomputable and would take infinite time to reach. We’re comparing two uncomputable ultra-powerful distributions against each other, and seeing which one is better. If the ultimate perfect limit form of Logical Inductors has some nice property the Universal Semimeasure doesn’t have, it means we should probably be approximating Logical Inductors instead of the Universal Semimeasure.

Since it’s tiring to keep typing, we’ll abbreviate them as USM (universal semimeasure) and LIL (logical inductor limit)

So, what is dominance? As phrased in this paper, it means that there’s a positive constant $~$C$~$ such that, for every finite bitstring $~$\sigma$~$, $~$\mathbb{P}_{USM}(\sigma)<C*\mathbb{P}_{LIL}(\sigma)$~$. So, if C was 100, it would mean that the a-priori probability that the LIL gives to any observation is never more than 100 times less than the a-priori probability that the USM gives to it. So the LIL doesn’t lose arbitrarily badly. It may do a little worse, but it’s bounded on how much worse it can do, no matter what problem you throw at it.

It’s also worth pointing out that “dominance” sounds like “actually better than” where in this case it just means “never ever more than $~$C$~$ times worse than”.

But here’s the cool part. The USM doesn’t dominate the LIL. For all positive constants $~$C$~$, there’s a finite bitstring $~$\sigma$~$, where $~$\mathbb{P}_{LIL}(\sigma)>C*\mathbb{P}_{USM}(\sigma)$~$ There are bitstrings where LIL can beat the USM arbitrarily badly. A trillion times worse? A quadrillion times worse? There are finite bitstrings where the USM fails that badly against the LIL. (by “fails”, I mean it assigns 1 quadrillionth the a-priori probability to)

So this thing beats the theoretical gold standard for probabilistic inference. #Rekt.

It’s also a good motivation for the paper. I said at the start that I didn’t realize logical induction was possible before - whereas it would be if you just fed all the statements to Solomonoff Induction or some computable approximation to it. This paper is claiming that it can do better than that, both in terms of having some nice properties that Solomonoff Induction doesn’t, and in coming out pretty good in a head-to-head comparison.

**The logical induction criterion says (very roughly) that there should not be any polynomial-time computable trading strategy with finite risk tolerance that earns unbounded profits in that market over time.**

OK so why is the paper making stock market analogies? Well, because the efficient market hypothesis is relevant to making the probabilities inexploitable by a dutch-book, and dutch-book arguments can justify most of probability theory. So, if an inexploitable market in math sentences exists, that means you can probably get reasonable probabilities out of it.

We’re imagining some model - some complete background mathematical universe against which all statements can in principle be evaluated as true or false. Such a thing isn’t computable so neither our theorem prover nor our logical inductor deal with it - even the LIL is only presented with the infinite list of sentences that can actually be proved one way or the other. (Those will be given 0 or 1 in the limit, everything else is given some number in between).

Now given this model we can imagine various traders who get a payout of $1 for each stock they own in a statement that turns out to be true. Everything is either true or false according to the background model. So either the statement or its negation is going to get the payout.

Who actually ends up rich isn’t the interesting part. The interesting part is that the assumption of an inexploitable market can fulfill desiderata of nearly any sort, by showing a trader that'd exploit the market if the property was violated. This is neat because this general strategy shows that we don't have to worry about missing some useful property. If there's some useful property that we forgot to check, but it's possible to write a trader that'd exploit it, it'll fulfill that too.

[todo: import all the other notes]

## Introduction

**For example, let φ stand for the claim that the 87,653rd digit of π is a 7. If this claim is true, then (1 + 1 = 2) ⇒ φ. But the laws of probability theory say that if A ⇒ B then Pr(A) ≤ Pr(B). Thus, a perfect Bayesian must be at least as sure of φ as they are that 1 + 1 = 2! Recognition of this problem dates at least back to Good (1950).
Many have proposed methods for relaxing the criterion Pr(A) ≤ Pr(B) until such a time as the implication has been proven (see, e.g, the work of Hacking 1967 and Christiano 2014). But this leaves open the question of how probabilities should be assigned before the implication is proven, and this brings us back to the search for a principled method for managing uncertainty about logical facts when relationships between them are suspected but unproven.**

The introduction talks about empirical and logical uncertainty, and then introduces this as an example. It’s a good time to talk some more about what probability really means.

Probability is a special case of measure, that’s confined between 0 and 1, and applies to “statements”. What the heck does that mean? Well, let’s say you’ve got a really big set that contains all possible worlds. There’s a measure over this set. You could think of “the statement” as a function that maps a possible world to 0 if the statement isn’t true of that possible world, and 1 if the statement is true of that possible world. The probability of the statement would be the measure over the set of all possible worlds where the “statement function” evaluates to 1.

Intuitively, what’s going on is something like this. We have a set of all possible worlds, some subset that satisfies condition B and some smaller subset that satisfies condition A. The A ⇒ B language (A logically implies B) can be thought of as the set of A things being contained within the set of B things. In the picture, the size of each blob is the measure (or probability) associated with it, and we can see that the subset is going to have a smaller (or equal) measure.

But how would we show a world where the 87,653rd digit of π takes on a counterfactual value?

Let’s look at our inventory.

- Universe of Possible Worlds (a set)
- Possible world (an element of the universe)
- Attribute of possible world (function from an element to {0,1})
- P(attribute) (measure of the set of possible worlds such that the attribute function evaluates to 1)

In this setup each possible world is some totally opaque thing. It has “attributes” only in the sense that we can define functions on it but otherwise it has no visible structure any more than the pixels in the diagram above do.

But there’s nothing stopping us adding more some structure artificially. We could think of the elements of the universe as some ordered collection of qualities such as (tall,red,bouncy,cube) or (small,orange,fluffy,sphere). But more interesting from the point of view of logical uncertainty is to represent these elements as the entire set of all things that we could say about them, more specifically as a set of predicates written in formal logic. This becomes difficult to illustrate because we can’t write down the entire set of such statements - it’s too huge and in fact infinite. But we can give a little flavour of what it might look like:

World 1:

- tall
- tall ∧ bouncy
- tall ∨ small
- tall ∧ tall ∧ tall ∧ tall ∧ tall ∧ tall ∧ cube
- red ∨ ¬red …

World 2:

- sphere
- sphere ∧ ¬bouncy …

A lot of these sentences are going to be boring, just repeating logical tautologies or things we already knew. But the concept as a whole is pretty powerful.

In the two examples above the picture presented is consistent: we don’t include two sentences that contradict each other. It’s also complete (is this the right word?) in the sense that we expect to eventually follow through everything to its logical conclusion, producing some extremely grandiose statements about our little objects that nevertheless follow logically from the simpler ones.

In the case of conventional probability theory, this is how it has to work. We’re just not interested in objects that are logically inconsistent. But there’s nothing really stopping us from extending the concept to include logically inconsistent sets of sentences - and this is the setup in which logical uncertainty (i.e. the point of this paper) naturally lives.

The second paragraph that is quoted is simply saying that rejecting the consistency condition from our model isn’t enough to be helpful, we need some system to replace it with. Arbitrarily assigning numbers to miscellaneous sets of logical statements doesn’t help anybody with anything. There have to be some rules or some nice properties that our system exhibits.

…

**Our main finding is a computable algorithm which satisfies the logical induction
criterion, plus proofs that a variety of different desiderata follow from this criterion.**

This tells us that the logical inductor isn't just a hypothetical mathematical object, but that it's actually computable. Also, while there are probably other implementations of a logical inductor, showing one is enough for a proof of concept. And the nice properties they associate with a logical inductor will be true for any logical inductor, not just their particular reference implementation.

**there are clear drawbacks to our algorithm: it does not use its resources efficiently;
it is not a decision-making algorithm (i.e., it does not “think about what to think
about”); and the properties above hold either asymptotically (with poor convergence
bounds) or in the limit**

This is the More Research Is Needed part - they’re explaining some caveats to their particular implementation. (which don't need to be fixed yet. The logical inductor work is sufficient to go off of, and further efficiency gains would probably advance the timelines for AGI more than properly aligned AI. So if you have a cool idea on how to make a really speedy logical inductor, keep it to yourself.

“it does not use its resources efficiently” - this part is pretty obvious, 1kb of RAM looks a lot like 100 terabytes when you’re doing computability math, since they’re both just finite after all. But in practice it makes a lot of difference obviously.

“It is not a decision making algorithm” This refers to the fact that it passively accepts sentences from the theorem prover instead of giving suggestions back to the theorem prover about what to go after next (which humans kinda do)

“Poor convergence bounds”. As we'll see later, the halting problem resurfaces in a very inconvenient proof that means that it’s nigh-impossible to prove convergence bounds in general.

## 1.1 Desiderata for Reasoning under Logical Uncertainty

**Desideratum 1 (Computable Approximability)**

Fairly obvious? We know what computability is and why it’s nice, under the understanding that it doesn’t mean it’s practical to implement on a real computer due to resource constraints. More to the point, if you don't have computability, implementing a variant of it in real life is hopeless.

**Desideratum 2 (Coherence in the Limit)**

The “Pr(φ) ≤ Pr(ψ) whenever φ ⇒ ψ” part we’ve already touched on. There is a nontrivial aspect of this, however. It turns out the way they define the deductive process, it only needs to produce the axioms (and maybe some logical consequences of them if it likes, but it doesn’t have to). This desideratum states that if the deductive process isn’t going to figure out all the logical implications itself, the logical inductor limit has some ability to step up and do it itself. (I think?)

**Desideratum 3 (Approximate Coherence)**

This at first sounds like a weaker version of Desideratum 2.

BUT.

Desideratum 2 is talking about the limiting case, that infinite uncomputable ideal. We don’t want to leave room for our system to be arbitrarily badly behaved before that point because then it would be useless in practice. At the same time, we can’t enforce total coherence at finite points in time because the system needs time to find and iron out all the wrinkles (otherwise we’d just be back to boring impossible logical omniscience). So desideratum 3 is a kind of stopgap coherence that I imagine looking something like this

**Desideratum 4 (Learning of Statistical Patterns)**

This goes back to what we were saying about guessing if a number is prime.

The definition of what a “pattern” isn’t obvious. In the case of asking if the n'th digit of pi is equal to k, you can write the sentences to all look identical except for one thing which is n, in which case fair enough it’s a pattern. But what about sentences that are different in many places but are structurally similar? The authors mention previous attempts by themselves to formalize this but they don’t say here whether they were happy with the results. If you’re trying to turn an informal concept into a formal one it’s hard to really be sure whether you got it right.

**Desideratum 5 (Calibration)**

On the face of it this seems like it should be easy to fake. Take a system that’s consistently overconfident, and you can just apply a fixup function to whatever it produces and it’s now well calibrated. (If it assigns 60% odds to stuff that happens 80% of the time, map an output of 60% probability to 80% probability instead)

**Desideratum 6 (Non-Dogmatism)**

For statements that are ultimately going to get proved true or false, this is already implied by Desideratum 5 (Calibration), since if it produced extreme probabilities it would end up being wrong a lot of the time and hence badly calibrated.

But there are also the statements that can’t be proven true or false, such as the continuum hypothesis in the context of Zermelo-Fraenkel set theory, and we don’t want to be assigning those things 0 or 1 (or even numbers that are too close to 0 or 1, going beyond the sort of confidence we can reasonably have in them).

We could, however, do that and not fail calibration since we’d never be proven wrong. This is why this criterion is necessary.

**Desideratum 7 (Uniform Non-Dogmatism)**

This one is actually pretty interesting. We might imagine that as we build up longer and longer sentences with more and more “ands” the probability keeps going down, and that we can get the probability arbitrarily close to 0 just by adding more and more stuff. This desideratum says that this isn’t what should happen, at least in some cases.

What it is talking about is any computably enumerable consistent theory. The “consistent” part just means that if we insert two or more statements into our big conjunction that contradict each other, our system is legitimately allowed to notice that and supply a probability of 0 (or one that tends to 0 in the limit at least).

The computably enumerable part is a little more subtle. Suppose we have a theory of natural numbers plus some extra predicate q(n). We don’t have any axioms as to how q is supposed to behave, so it’s free to take on a true or false value independently for each natural number n. We can construct theories that look like:

q(0) ∧ ¬q(1) ∧ ¬q(2) ∧ ¬q(3) ∧ q(4) ∧ ¬q(5) ∧ q(6) ∧ …

where the placement or non-placement of the logical not ¬ is arbitrary and doesn’t necessarily follow a pattern that’s computable.

We’re back to the problem, then, of there being an uncountable number of possibilities here. There isn’t really room to give them all a nonzero probability.

This, then, is basically the same kind of setup we had with our universal semimeasure. It too would assign nonzero probabilities to the computable infinite sequences and zero to the uncomputable ones.

Here’s why it’s so important. Lots of theories like first-order PA or ZFC have an infinite list of axioms. However, the axiom schemas of PA and ZFC follow a simple computable pattern. So this is requiring that “no, you can’t just assign zero probability to the conjunction of infinitely many sentences if there’s a compact pattern that generates all of them”. It’s required to have a positive probability for the axioms of PA and ZFC. We want our inductor to actually be able to learn the axioms and not go “well, I’ve seen the first trillion instances of this axiom schema be right, but I still don’t know about the trillion and first, and I assign 0 probability to every single instance being right.”

To put it another way, the upstream deduction thing is indeed going to be producing a “computably enumerable consistent theory” and it’s desirable to assign a nonzero probability to the thing that is in fact going to happen.

**Desideratum 8 (Universal Inductivity)**

We’ve talked about the universal semimeasure, and what it means to dominate it, already. This criterion is important mainly because the universal semimeasure has some nice properties discussed in the paper, which to some extent the LIL will inherit if it dominates it.

- It assigns non-zero prior probability to every computable sequence of observations.
- It assigns higher prior probability to simpler hypotheses.
- It predicts as well or better than any computable predictor, modulo a constant amount of error.

The LIL has to follow the same properties. If the USM assigns nonzero probability to something, then the logical inductor does the same (otherwise it wouldn't dominate it). The LIL assigns higher probability to bitstrings with a lower K-complexity if it is being used for sequence prediction, because almost all pairs of bitstrings can be proved to be mutually exclusive, and the limit respects those relations. Then, the "splitting up finite amount of probability mass among infinite options so the more complex ones have to recieve less probability mass in the limit" argument applies equally well to the LIL. Finally, the LIL will only lose by a constant factor to the USM (worst-case), and the USM will only lose by a constant factor to a computable predictor (worst-case), so LIL will only lose by a constant factor to the computable predictor (worst-case)

**Desideratum 9 (Approximate Bayesianism)**

Bayes' rule is just a result from basic probability theory, it’s nothing fancy. It still gives the ideal gold standard of how to update the probability of hypotheses based on new evidence, and is responsible for science working. It concisely captures the essence of updating on information, and can be modified to derive several useful rules of thumb for thinking probabilistically. (such as, if you know in advance what your answer will probably end up being, you should change your views to that already).

How exactly it’s supposed to be approximated here is not clear. We can’t actually just use it here, because it handles empirical uncertainty, not logical uncertainty.

**Desideratum 10 (Introspection)**

**Desideratum 11 (Self-Trust)**

We’ve talked about the self-reference stuff already, though it’s worth pointing out again that in English you can turn a statement about the world into a statement about your beliefs just by putting “I believe that” in front of it. In math speak it’s a lot more complex because you have to include some kind of encoding of yourself into the sentence itself, and replace the thing that you’re asserting with its symbolic representation.

Instead of “2+2=4” it’s “(some big mess of computation) believes that ‘2+2=4’ ”.

**Desideratum 12 (Approximate Inexploitability)**

If you fail at probability, there’s a good chance someone could construct a scheme to extract money from you by getting you to bet on your inconsistent beliefs. This might not affect us humans so much because we can just refuse to bet (in either direction) if we’re not feeling like it. But for artificial systems we can imagine them betting against each other and if one of them seems to be reliably winning then the other one has a problem.

In some sense it’s easier to design the bookie than the other agent in a Dutch book situation, because the bookie only needs to find one problem and the other agent needs to be consistent across all their beliefs. With this in mind, we need to design our definition of “can be Dutch booked” to not allow the bookie to just cherry-pick some logical fact that our poor agent hasn’t figured out yet and make a killing based on that. That’s what the “approximate” part is talking about, and we’ve already touched on it with the idea that the bookie is only allowed to make a bounded amount of money (or accept unbounded risk), and that the bookie itself has to run in polynomial time or whatever.

As mentioned before, this idea is actually pretty central to the authors’ strategy and the other nice properties emerge as consequences of it. So when this is all defined more formally, pay close attention because it’s going to be crucial.

**Desideratum 13 (Gaifman Inductivity)**

Given a $~$\prod_{1}$~$ statement φ (i.e., a universal generalization of the form “for every x, ψ(x)”), as the set of examples the reasoner has seen goes to “all examples”, the reasoner’s belief in φ should approach to certainty.

Can you see a problem with this? Have a think.

.

.

.

.

.

. %%hidden(Answer: Think More Before Opening!): The problem is that it doesn’t really fit with how first order logic works. Just because we can prove ψ(0), ψ(1), ψ(2), ψ(3), ψ(4), … in some system such as Peano Arithmetic, doesn’t mean we can necessarily prove the universal ∀x.ψ(x).

A lot of the time we can and that’s great. But take something like the Gödel sentence G: “this sentence can’t be proven true in PA”. This can be thought of as universally quantified: “∀n, after turning the handle n times on the machine which enumerates all provable PA sentences, we still haven’t proven G yet”. For any specific n this is true and we can prove it just by laboriously detailing exactly what our PA-proving machine is going to do step by step. But the general case we can’t.

And of course there are “nonstandard” models of PA where it isn’t even true.

(In case this isn’t clear yet, the previous sentence wasn’t intended to be a surprise. There’s something called Gödel's completeness theorem - yes you read that right - which states that something in first-order logic is consistent if and only if it has a model. This means that for each of these weird unprovable statements, there’s actually a model where all the axioms hold true but our weird statement doesn’t. This is still the case even when it’s a statement that really intuitively sounds like it is supposed to be true. Because we think of the statement in terms of talking about numbers, but the axioms of PA don’t just describe numbers they describe other kinds of object as well that just look suspiciously like numbers at first glance.

Similarly, the axioms of ZF set theory don’t just describe what we might imagine a set to be they can be seen to describe nonstandard things that only look like sets at a first glance. This is important and you can’t really get away from it. (All in the context of first order logic. Other kinds of logic may have entirely different problems).

So what does this mean for Desideratum 13? It means it’s probably not going to happen, I’m afraid. If it was true, it’d rule out nonstandard models, but hey, maybe you’re dealing with something that actually IS a nonstandard model, like hyperreal numbers. We are probably going to want our LIL (i.e. the limiting case) not to produce the value 1 for statements that are not actually provable, and if that’s so then this and what I just said pretty much flat out contradicts Desideratum 13. (The non-dogmatism property is related).%%

**Desideratum 14 (Efficiency)**

They’ve already said that this one isn’t going to happen.

You might think of complexity theory as being about the difference between “fast” and “slow” algorithms. It's difficult to pin down the exact runtime, so answerable questions are usually more along the lines of “does this algorithm run in polynomial time or not”. That means, given an input string of length n, does the algorithm produce the correct result in time $~$C*poly(n)$~$? The constant factor, and the polynomial used, may depend on exactly how we’ve formalized the notion of a computer or Turing machine, which is part of what drives this notion of vagueness. It’s easy to think of “polynomial time = fast, otherwise = slow” but the reality of course requires looking at the details. $~$\mathcal{O}(n^{100})$~$ is not going to be terribly useful in practice.

And obviously, the fact that something is “computable” says nothing at all about its efficiency. It could run in Ackermann time and still be computable. (I think they mentioned that the runtime could be pushed down to double-exponential, but $~$2^{2^{n}}$~$ still grows very fast. Going from n=8 to n=9 is a jump from $~$1.16*10^{77}$~$ to $~$1.34*10^{154}$~$, which is huge when you think about it.

So yeah, efficiency won’t be the top concern in this paper, they’re dealing with some of the more basic theoretical groundwork first.

**Desideratum 15 (Decision Rationality)**

Remember the setup we described? Here’s another copy of the picture to remind you.

It would be nice if the “logical induction thing” had a way to ask the “deductive process” to focus its attention around whichever part of mathematics seemed to be most decision-relevant.

This isn’t going to be how it works though, unfortunately. Not for this paper. The logical induction box sits strictly downstream of the deductive theorem prover - it just has to take what it gets and make the best of it. No steering is available. How the deductive process is supposed to work and how it would decide which parts of math to look at next is beyond the scope of the paper.

(This is another blocker if you wanted to make an AI that accidentally takes over the world. The agent might be frustrated that the deductive process part of its mind was spending all its resources proving things about the Riemann hypothesis instead of something more relevant to how neurotoxins work at the molecular level or something.)

**Desideratum 16 (Answers Counterpossible Questions)**

Once something has been proved to be false, you will get logical explosions if you ask what would happen if it’s true. But in the meantime, statements-that-turn-out-to-be-false should look pretty much like statements-that-turn-out-to-be-true from the point of view of the logical induction system, just maybe with different probabilities attached.

This is mostly interesting in the case of conditional probabilities predicated on something that are known to be a logical contradiction.

The conditional probability $~$P(X |\perp)$~$… If we try and evaluate this the usual way we would just get zero divided by zero. But it’s possible some algorithm could do better than that, and actually come up with a specific value for this particular zero divided by zero.

Why do we care?

The reason that this desiderata is important is for decision theory. Updateless decision theories rely on reasoning about counterfactuals that are known to be false, in certain cases, and even regular decision making requires reasoning sanely about the counterfactuals where you take different actions than you actually will. It’s possible to get weird self-fulfilling reasoning like “if I go left, I will be eaten by wolves (by principle of explosion)” and that makes you go right instead, thus making your reasoning true. We don’t want it to do crap like that.

Specifically there is a decision theory thought experiment known as “counterfactual mugging with a logical coin”. In this thought experiment, there is an entity called Omega which is able to see into your mind somewhat and at least understands the process by which you make decisions. In this particular case, however, Omega isn’t logically omniscient in other ways. We also assume Omega is scrupulously honest about its own knowledge and decisionmaking process.

Omega shows up and is like:

- The trillionth binary digit of pi is 0
- Previously, I didn't know whether the trillionth binary digit of pi was 1 or 0
- And I committed to asking for 10 dollars if it was 0
- I also committed to giving you a hundred dollars if it turned out to be 1 and I’d predicted that you'd give me 10 dollars if it was 0
- Since the trillionth digit of pi is 0, can I have ten bucks?

You’re under no obligation to pay Omega the ten bucks in this scenario. You can save ten bucks by not doing that. But this hasn’t actually happened yet. You get to choose your decision algorithm right now, ahead of being faced with an actual Omega, and given the setup of the problem it was just as likely that Omega would have chosen the option that would have yielded the payout. It’s more valuable to have chosen a decision algorithm that gets the \$100 than to choose one that avoids paying the \$10, if each is equally likely a priori.

In the “logical coin” version of this problem, we really are faced with thinking about what would happen if something that we know to be a logical contradiction in fact wasn’t. It’s hard to wrap your head around but Desideratum 16 suggests maybe it’s possible.

Also, it’s not clear that anything like Omega or this exact puzzle is going to crop up in real life. But this kind of weird edge-case thought experiment is often a good way to hunt for weaknesses in decision theory in general.

It should also be noted that counterfactual reasoning is needed when other parties are making decisions based on what you WOULD do in a different circumstance. Like, let’s say when someone does something wrong you would want to scold them for it. And because of that, they decide not to tell you the thing. In order to fix this, you'd need to consider the counterfactual where they do tell you the bad news, and commit to not scolding them even though it’d look like the best idea at the time, and if they’re good at predicting you, it’d cause them to tell you the bad news that you’d want to hear about anyways.

Basically, people can be a little like Omega in that they can predict your behaviour in different circumstances and sometimes it’s more straightforward to change your behaviour than to try to fool them.

**Desideratum 17 (Use of Old Evidence)**

Alright, so the way that solomonoff induction, or hell, even Bayesian reasoning, works, is that you have a gigantic total complete list of ALL hypotheses, and you go around changing your mind about them in response to new evidence. In reality, you can’t do that. You can’t have a big list in advance of every hypothesis, no matter how complicated. You need to leave some room for “weird unpredictable crap happens and I’ll try to figure out an explanation at the time”. For science, most of the hard part is inventing the new hypothesis/explanation in the first place! Like general relativity. Very successful, wasn’t in anyone’s hypothesis space before Einstein. That seems to be a critical part of real-world-successful agents that nobody knows how to do, how to create new hypotheses, and steel yourself against unknown unknowns. One of the things humans do when they create a new hypothesis (remember, something Solomonoff doesn’t do), is retroactively score it for how well it predicts old, anomalous evidence. Sadly, this algorithm doesn’t achieve this property.

Thi algorithm takes theorems as input and outputs probability distributions over an ever-growing set of sentences as time goes on. So on day 1, it spits out probabilities for stuff it's had time to think about, on day 2, it spits out probabilities for more stuff it's also had a longer time to think about, and so on.

The probability distribution that it outputs is finite (of course) but large and ever-growing. It will contain a few interesting sentences but a lot of crap, and doesn’t particularly try to privilege the generation of the interesting ones first. If it did, the “old evidence” criterion would be a useful one, but as it is it’s just generating any hypothesis that occurs to it.

(The hypotheses in this case just correspond to sentences of first-order logic not some meta-level hypotheses about patterns of probabilities amongst different sentences or whatever).

Similarly, the "traders" are produced by brute force, instead of the agent noticing its own biases on certain questions and crafting a trader to exploit that bias.

**Giant List of Notation and Definitions**

**Formalize Logical Induction (can probably be wrapped in with Notation)**

**Proof of Algorithm Functionality**

**Important proof prerequisites**

**e-ROI Lemma**

**Preemptive Learning and Convergence**

**Persistence of Knowledge**

**Coherence and Provability Learning**

**Recurring Unbiasedness**

**Calibration**

**Unbiasedness with Feedback**

**Pseudorandom Learning**

**LUV Lemmas**

**Expectation Lemmas**

**Introspection and Paradox Resistance**

**Expectations of Itself**

**Future Trust**

**Parametric Trader Lemma**

**Uniform Non-Dogmatism and Occam Bounds**

**Semimeasure Domination**

**Good Conditionals (currently broken)**

**Desiderata for further progress ie stuff this algorithm doesn't do**

**Cool additional IAFF stuff with Kosoy's Incomplete Models**