# Logical Inductor Notation and Definitions

https://arbital.com/p/logical_inductor_nota

by Alex Appel Apr 12 2017 updated Apr 21 2017

A handy guide to the blizzard of notation in the Logical Induction Paper

It should be noted that, although there are many new definitions introduced in the Logical Induction (incomplete) paper, there is a great deal of internal structure that recurs over and over, like the Standard Model of physics. This page will try to reveal the underlying structure that generates the definitions, as well as what the definitions are used for, because processing all the new definitions is essential for truly understanding the paper.

There are four general "conceptual slots" to establish, and a substantial list of minor modifications to each of the slots.

The following image is useful to blow up and keep as a reference for reading the paper. Anyways, the first important "conceptual slot" is the "sentence" slot. These can be interpreted as sentences of math, or as stocks that the "traders" are trading. $\mathcal{S}$ is the set of math sentences.

The second "conceptual slot" is the "valuation" slot. This assigns stuff from the "sentence" slot a particular value, that is typically interpreted as a probability/price. They are generally functions $\mathbb{V}:\mathcal{S}\rightarrow\mathbb{R}$

The third "conceptual slot" is the "coefficient" slot. You could think of this as the function a particular trader uses to determine how many shares of a particular sentence to buy. They rely on the history of past valuations, and have type signature $\alpha:\overline{\mathbb{V}}\rightarrow\mathbb{R}$

The fourth "conceptual slot" is the "combination" slot. This is the actual trade a "trader" makes on a particular day, and is represented as a linear combination of stuff from the "sentence" slot, with the coefficients being stuff from the "coefficient" slot. The coefficient for a particular sentence, when run on the past "valuation history" $\overline{\mathbb{V}}$, produces the number of shares of that sentence bought or sold on that day. It is possible to abuse notation to get the "valuations" to value "combinations" instead of sentences.

There are many additional subtleties, but these are the basic categories to become fluent with. Feel free to refer back to this page as needed.

Sentence Slot

• $\phi,\psi$ are sentences of math, and $\mathcal{S}$ is the set of all math sentences.

• $X$ is a LUV, a logically uncertain variable. It's a formula with a free variable that can be proven to be unique. It's possible to make sentences out of LUV's, by something like $\forall x: X(x)\rightarrow x>3$. $\mathcal{U}$ is the set of all LUV's in [0,1].

• $\overline{\phi}$ and $\overline{X}$ are sequences of sentences and LUV's, respectively. They may be poly-time computable, or not.

Deductive Process

• $\overline{D}$ is a deductive process. $D_{n}$ is the full list of math sentences revealed so far by the n'th day. It is computable and finite at each stage. $D_{\infty}$ is defined in the obvious way.

Valuation Slot

• $\mathbb{V}$ is a valuation and is a function from $\mathcal{S}\rightarrow[0,1]$. It assigns every math sentence a value in [0,1], which can be interpreted as a probability, a price, a truth value, or something even more general.

• "rational" means that the valuation only outputs rational numbers. This is necessary for being able to compute the value of a sentence and move on, and the actual logical inductor will have this property (though the limit may not).

• $\mathbb{P}$ is a rational valuation, aka a pricing. This is a logical inductor on a particular finite day. $\mathbb{P}(\phi)$ is the value/price/probability of the sentence $\phi$.

• $\overline{\mathbb{P}}$ is a sequence of pricings, or a market. This is the total list of all the finite-time functions of the logical inductor. $\mathbb{P}_{n}$ is the prices on the n'th day, and $\mathbb{P}_{n}(\phi)$ is the price of the sentence $\phi$ on day $n$. $\mathbb{P}_{\infty}$ is defined in the obvious way.

• $\mathbb{W}$ is a world. It's a valuation that can only value sentences at 0 (for false) and 1 (for true), like an infinitely opinionated list of mathematical truth. The traders get scored by a special class of these. $\mathbb{W}(\phi)$ is 0 or 1 depending on whether $\phi$ is true or false according to $\mathbb{W}$.

• $\mathcal{PC}(\Gamma)$ is the set of worlds that are propositionally consistent with respect to $\Gamma$. Given a set of sentences $\Gamma$, it's the worlds that respect all the boolean algebra you could possibly do on the sentences, and assign every sentence in $\Gamma$ a value of 1. This is important because it's the set of worlds that the traders will finally be scored against.

• $\mathbb{W}(X)$ is the value of the LUV $X$ according to $\mathbb(W)$. It's the largest real number that is less than or equal to $X$. This is needed because, say, what if $X=\frac{1}{\sqrt{2}}+3\epsilon$? But remember, the output has to be a real number, so we "round it off".

Coefficient Slot

• Valuation features. They are continuous (very important condition for finding stable market prices!) function from $\overline{\mathbb{V}}\rightarrow\mathbb{R}$. There's one other condition. This function can only depend on an initial sequence of $\overline{\mathbb{V}}$. After all, the decision of how many shares of a sentence to buy can't depend on the behavior of the market across all time! It should only depend on the behavior of the market on day n or before. The maximum n that the function takes into account is called the "rank" of the feature. $\mathcal{F}$ is the set of valuation features.

• $\phi^{*n}$ is a price feature. It's just a special feature that returns $\mathbb{P}_{n}(\phi)$, or the price of $\phi$ on day n according to $\overline{\mathbb{P}}$. Very easy to compute.

• $\xi$ is an expressible feature. $\alpha^{\dagger}$, $\beta^{\dagger}$, $\gamma^{\dagger}$, and $w^{\dagger}$ are often used too. It's just like a valuation feature (continuous, takes a valuation sequence as input, only depends on an initial sequence, has a rank), with one extra condition. It has to be built out of a fairly small core of easy-to-compute functions. This is important because otherwise it'd be totally infeasible to evaluate what a trader would do. The allowable functions it can be built out of are price features (for the same and previous days), rational numbers, addition, multiplication, $max(-,-)$, and $\frac{1}{max(1,-)}$. It is possible to chain these to define slightly fancier functions like min, and a restricted form of division, but making a function out of these pieces guarantees it'll be easy to compute, and continuous too! $\mathcal{EF}$ is the set of all these functions, and you could think of it as standing for Easy Functions. Notice that if it's applied to a market $\overline{\mathbb{P}}$, it'll only return rational numbers.

• $\overline{\alpha^{\dagger}}$, $\overline{\beta^{\dagger}}$, $\overline{\gamma^{\dagger}}$, and $\overline{w^{\dagger}}$ are $\mathcal{EF}$-progressions. They are sequences of expressible features where $\alpha^{\dagger}_{n}$ has rank n or less, so they are made of the small core of allowable functions. It's basically just a sequence of continuous functions that are allowed to look at an ever-expanding amount of data from the market each day.

• Remember, $\dagger$ denotes that it's $\mathcal{EF}$ in some way, and $*n$ denotes the price of something on the n'th day, and overlines denote that it's a sequence.

Combination Slot

• $\mathcal{F}$-combinations are linear combinations of sentences and valuation features, with an extra cash term stuck on the end. If a valuation is plugged in so the features turn into actual numbers, then you could think of them as just ordinary vectors, where each axis/basis vector is a sentence, and there's one extra axis for money. You can also define an $\mathcal{F}$-combination progression, where the n'th combination can only use information from day n or before. (Maximum rank among the coefficients of the n'th combination is n or less.) You can also define a $\mathcal{F}$-LUV-combination in a similar way, the "sentence" slot just has LUV's instead.

• $A^{\dagger}$ is an $\mathcal{EF}$-combination. Again, it's exactly like an $\mathcal{F}$-combination, the coefficients are just that special restricted class of easy-to-compute functions mentioned earlier, so it's a linear combination of $\xi$'s and $\phi$'s with a cash term. Remember, the $\dagger$ means that it involves those easy-to-compute functions.

• $\overline{A^{\dagger}}$ is an $\mathcal{EF}$-combination-progression. A sequence of $\mathcal{EF}$-combinations where the maximum rank on the n'th $\mathcal{EF}$-combination is n or less, as before.

• $B^{\dagger}$ is an $\mathcal{EF}$-LUV-combination. Again, it's exactly like an {F}-combination, the coefficients are just that special restricted class of easy-to-compute functions mentioned earlier, and instead of sentences it has LUV's. (are you starting to see how the definitions follow certain patterns and just swap different components into the same conceptual slots?)

• $\overline{B^{\dagger}}$ is an $\mathcal{EF}$-LUV-combination-progression. You should be able to figure this one out, just take the name apart into pieces and refer to previous descriptions.

• $A$ is an $\mathbb{R}$-combination. Instead of being a linear combination of valuation features and sentences, it's just a linear combination of real numbers and sentences (with a cash term, of course). You could think of it as just a bundle of cash and shares. Plugging a particular valuation like $\overline{\mathbb{P}}$ into a combination to turn the coefficients into real numbers produces one of these. $B$ is an $\mathbb{R}$-LUV-combination, defined in the usual way. Instead of being a linear combination of sentences and cash, it's a linear combination of LUV's and cash.

• $\overline{A}$ is an $\mathbb{R}$-combination-progression, defined as before. $\overline{B}$ is an $\mathbb{R}$-LUV-combination-progression, and by now, you should be able to pick up the pattern.

• $T$ is a trade, a linear combination of expressible features $\xi$ and sentences $\phi$, with a special cash term at the end, $c:=-\sum\limits_{i}\xi_{i}\phi_{i}^{*n}$ (this just means that the cash term must equal the sum of money paid to acquire all the stocks). It is basically an $\mathcal{EF}$-combination with special restrictions on the constant term. Remember the difference between the two. A trade must have 0 value according to the market on day n, while an $\mathcal{EF}$-combination is just an arbitrary bundle of shares and cash (with special restricted coefficients). This is also called an "n-strategy".

• $\overline{T}$ is a trader, a sequence of trades (with the standard restriction on the rank of the trades for day n). This is the core definition. A trader is a sequence of trades, where each individual trade is composed of "simple" strategies for buying each sentence, and each trade has 0 net value (it needs to pay for the stocks). In most places in the paper, the extra requirement that $T_{n}$ be computable in $poly(n)$ time is imposed. So, you could also think of the trader as the algorithm that takes in the day n and writes down a trade for that day. Note that the trade itself doesn't need to be efficiently computable! The algorithm just has to be able to write down the trade in polynomial time. The upstream algorithm/trader is blind, the "trade/trading strategy" itself is what actually responds to the market behavior through its coefficients.

• $(\overline{T}^{k})_{k}$ is an efficiently emulatable sequence of traders. I'll diverge from the paper and refer to this as a "trading firm" instead, to aid intuition. It's a sequence of traders, where the algorithm for the k'th trader/employee $\overline{T}^{k}$ can be written down by the master generating program in $poly(k)$ time, and all the algorithms of the "employees" have the same $poly(n)$ runtime bound for n. Also, the k'th trader/employee makes no trade on days before k. Pretty much, the "employees" must be easy to create, must all run in the same polynomial time bound, and one new "employee" gets added each day. The whole thing runs in polynomial time because on day n, only traders with $k\le n$ are being emulated, so n traders are being written down in poly(n) time each, and then they each run for poly(n) time, for a total runtime of n*(poly(n)+poly(n)), which is still a polynomial. $T_{n}^{k}$ is the trade made by the k'th "employee" on day n. These conditions are required to make a trader sophisticated enough to exploit certain types of inequalities related to sequences. Just assign one "employee" to each element of the sequence.

• $\mathcal{BCS}(\overline{\mathbb{P}})$ is the set of bounded combination sequences, and $\mathcal{BLCS}(\overline{\mathbb{P}})$ is the set of bounded LUV combination sequences. A bounded combination sequence is just a sequence of $\mathbb{R}$-combinations (bundles of stuff) that are $\overline{\mathbb{P}}$-generable (easy-to-compute given past market information), and bounded (all the bundles have less than a certain amount of stuff/all the L1 norms (this includes the cash term) are below a certain number) This may look complicated, but the reason this definition shows up is that it's only easy to exploit a sequence of $\mathbb{R}$-combinations if the traders can mirror the behavior of the sequence (P-generable) and have finite losses (bounded). The same general definition applies to LUV's too.

Valuations of Combinations

• $\mathbb{P}(A)$ is the value of the combination $A$ according to $\mathbb{P}$, defined in the obvious way. Just swap out each sentence $\phi$ for the value $\mathbb{P}(\phi)$. It is the final value of the combination, as judged by $\mathbb{P}$. This can be obviously generalized to $\mathbb{W}$, to get the value of a combination as judged by some other pricing on math sentences, or the value of a combination as judged by a particular world (way that math actually is).

• $A^{*n}$ is the value of the $\mathbb{R}$-combination $A$ according to $\mathbb{P}_{n}$. Just swap out all sentences for their day-n prices.

• $T[\phi]$ is the coefficient function for a particular $\phi$ in a trade, or the actual number of shares traded of $\phi$. Note that this is an expressible feature, while $T[\phi](\overline{\mathbb{P}})$ is a rational number, because it has the market plugged in as an input to the expressible feature. The same concept generalizes to all the different types of combinations.

• $T$ is the coefficient that represents the cash term in a trade. It is defined as above, and also generalizes to all the different types of combinations.

• $||T(\overline{\mathbb{P}})||_{mg}$ is the magnitude of a trade. It is the "stocks moved" of the trade, or the sum of the absolute magnitude of the coefficients of the sentences. It's slightly different from the L1 norm of the trader, because it doesn't include the cash term at the end. It's important because it's a hard upper bound on the value the trade can result in, and the loss the trade can produce. If the trade bought 5 shares/sentences valued at 1 dollar, and they all turn out to be wrong, the magnitude would be 5, and 5 dollars would be lost. If the trade bought 5 shares/sentences valued at 0 dollars, and they all turned out to be true, 5 dollars would be won.

• $||\overline{T}(\overline{\mathbb{P}})||_{mg}$ is the magnitude of a trader. It's the "lifetime stocks moved", or the sum of the magnitudes of all the trades. It is important for the same reasons, it provides a hard upper/lower bound on the lifetime wealth of the trader as assessed by any world.

Expectation Operators

• $\mathbb{E}_{n}^{\mathbb{P}}(X)$ is the n-precision expectation of the LUV X according to $\mathbb{P}$. It is defined as $\sum\limits_{i=0}^{n-1}\frac{1}{n}\mathbb{P}(X>\frac{i}{n})$. As n increases, this evaluates more and more sentences.

• $\mathbb{E}_{n}(X)$, when the valuation sequence/market is clear from context, means $\mathbb{E}_{n}^{\mathbb{P}_{n}}(X)$. This increases the precision, at the same time as the market is getting better and better at evaluating the probabilities of sentences. $\mathbb{E}_{\infty}(X)$ is defined in the obvious way.

• $Ex_{n}(B)$ converts the LUV-combination $B$ to a standard combination, by swapping out each LUV for the component sentences that make up the LUV-expectation, so then it's a linear combination of sentences instead of LUV's.

Indicator Functions

• $Ind_{\delta}(x>y)$ is a function that is 0 if $x\le y$, 1 if $x>y+\delta$, and linearly interpolates between the two otherwise. If the condition is true, it'll be positive. It is possible to generalize this to having x and y be expressible features, and then this indicator function would be an expressible feature too. It is also possible to generalize this to an interval indicator, as shown by the little diagram with the purple lines in the picture at the top of the page.

• $Val_{\Gamma}(A)$ is the true value of the $\mathbb{R}$-combination $A$ according to $\Gamma$ (technically all the worlds $\mathbb{W}\in\mathcal{PC}(\Gamma)$. Note that this requires that all the worlds agree on the value of $A$).

• $Thm_{\Gamma}(\phi)$ is 1 if $\phi$ is a theorem, 0 if it is not.

• $\mathbb{1}(\phi)$ is a special LUV that's 1 if $\phi$ is true, 0 if $\phi$ is false.

Number Sequences

• $\overline{\alpha}$ is a $\overline{\mathbb{P}}$-generable sequence of numbers if there's an efficiently computable $\mathcal{EF}$-sequence where $\xi_{n}(\overline{\mathbb{P}})=\alpha_{n}$. You could think of a $\overline{\mathbb{P}}$-generable sequence as an "easy-to-compute given past market data" sequence. It doesn't have to just be of numbers, you can have $\overline{\mathbb{P}}$-generable combination sequences (sequences of share/cash bundles)

• A divergent weighting $\overline{w}$ is a sequence of numbers in $[0,1]$ such that $\sum(w_{n})=\infty$. If you simplify it so all the weights are 0 or 1, $\overline{w}$ picks out an infinite subsequence. That is what this concept is used for, to pick/highlight subsequences out of sequences.

• A deferral function $f$ grows faster than n, and $f(n)$ is computable in $poly(f(n))$ time. In Computational Complexity Theory, this is called a time-constructible function. This is important because we want to prove good properties on subsequences, but there are some subsequences that take prohibitively long to generate in the first place so the traders might not be able to get good properties on them. This restricts us to "sufficiently computable subsequences" so we can prove nice things.

• An f-patient divergent weighting is a divergent weighting such that $\sum\limits_{n}^{f(n)}w_{n}<b$ for some bound b. Pretty much, it's a subsequence that thins out at the same rate or faster than the deferral function. So, if you picked $2^{n}$ as your deferral function, an f-patient divergent weighting would pick out a subsequence that got sparser at an exponential (or faster) rate.

(conditional definitions omitted since that proof still needs to be fixed)

(also still need to put the actual definition of a logical inductor)