Laplace's Rule of Succession

https://arbital.com/p/laplace_rule_of_succession

by Eliezer Yudkowsky Feb 17 2016 updated Oct 31 2016

Suppose you flip a coin with an unknown bias 30 times, and see 4 heads and 26 tails. The Rule of Succession says the next flip has a 5/32 chance of showing heads.


[summary: Suppose you roll an initial billiards ball on a billiards table, letting it bounce off the left and right sides until it comes to a halt. You mark down where the billiard ball halted. You then roll additional billiards, and observe that M come to rest to the left of the original billiard, and N halt to the right of the original billiard. If this is all someone knows, what probability should they assign that the next billiard comes to halt on the left, or the right? Laplace's Rule of succession says that the odds are (M + 1) : (N + 1) for left vs. right. If we flip a coin with an unknown bias and observe 2 heads and 7 tails, our probability of getting heads next time is (2 + 1)/(2 + 7 + 2) = 3/11. On the very first flip, it's 1 : 1 or 1/2.]

Theorem and proof

Suppose a sequence $~$X_1, \dots, X_n$~$ of binary values (0 or 1), e.g., a potentially non-fair coin which comes up heads or tails on each flip.

Laplace's Rule of succession says that if:

Then, after observing $~$M$~$ heads and $~$N$~$ tails, the expected probability of heads on the next coinflip is:

$~$\dfrac{M + 1}{M + N + 2}$~$

Proof:

For a hypothetical value of $~$f$~$, each coinflip observed has a likelihood of $~$f$~$ if heads or $~$1 - f$~$ if tails.

The prior is uniform between 0 and 1, so a prior density of 1 everywhere.

By Bayes's Rule, after seeing M heads and N tails, the posterior probability density over $~$f$~$ is proportional to $~$1 \cdot f^M(1 - f)^N.$~$

Then the normalizing constant is: $~$\int_0^1 f^M(1 - f)^N \operatorname{d}\!f = \frac{M!N!}{(M + N + 1)!}.$~$

So the posterior probability density function is $~$f^M(1 - f)^N \frac{(M + N + 1)!}{M!N!}.$~$

Integrating this function, times $~$f,$~$ from 0 to 1, will yield the marginal probability of getting heads on the next flip.

The answer is thus:

$~$\dfrac{(M+1)!N!}{(M + N + 2)!} \cdot \dfrac{(M + N + 1)!}{M!N!} = \dfrac{M + 1}{M + N + 2}.$~$

Simpler proof by combinatorics

Although Laplace's Rule of Succession was originally proved (by Thomas Bayes) by finding the posterior probability density and integrating, and the proof of Laplace's Rule illustrates the core idea of an inductive prior in Bayesianism, a simpler intuition for the proof also exists.

Consider the problem originally posed by Thomas Bayes: An initial billiard is rolled back and forth between the left and right edges of an ideal billiards table until friction brings it to a halt. We then roll M + N additional billiard balls, and observe that M halt to the left of the initial billiard, and N halt to the right of it. If this is all we know, what is the probability the next ball halts on the left, or right?

Suppose that we rolled a total of 5 additional billiards, and 2 halted to the left of the original, and 3 halted to the right. Then, using | to symbolize the initial billiard, the billiards would have come to rest in the order:

Suppose we now roll a new billiard, symbolized by +, until it comes to a halt. It's equally likely to appear at:

This means there are 3 ways the ball could be ordered on the left of the |, and 4 ways it could be ordered on the right. Since all left-to-right orderings of 7 randomly rolled billiard balls are equally likely a priori, we assign 3/7 probability that the ball comes to a rest on the left of the original ball's position.

See also the Wikipedia page.

Use and abuse

Laplace's Rule of Succession assumes that all prior values of the frequency $~$f$~$ are undistinguished a priori in our subjective knowledge.

For example, Laplace used the rule to estimate a probability of the sun rising tomorrow, given that it had risen every day for the past 5000 years, and arrived at odds of around 1826251:1. But today when we have physical knowledge of the Sun's operation, not every possible 'rate at which the Sun rises each day' is undistinguished. Furthermore, even in Laplace's time, he should have perhaps thought it especially likely a priori that "the Sun always rises" and "the Sun never rises" were distinguished as unusually likely frequencies of the Sun rising, a priori.

The Rule of Succession follows from assuming approximate ignorance about prior frequencies. It does not, of itself, justify this assumption. Variations of the rule of succession are obtainable by taking different priors, corresponding to different views of what should count as uninformative. See discussion on the Wikipedia page on non-informative priors. For example if starting with the Jeffreys' prior, then after observing $~$M$~$ heads and $~$N$~$ tails, the expected probability of heads on the next coinflip is:

$~$\dfrac{M + \dfrac{1}{2}}{M + N + 1}$~$

Nomenclature

Laplace's Rule of Succession was the famous problem proved by Thomas Bayes in "An Essay towards solving a Problem in the Doctrine of Chances", read to the Royal Society in 1763, after Bayes's death. Pierre-Simon Laplace, the first systematizer of what we now know as Bayesian reasoning, was so impressed by this theorem that he named the central theorem of his new discipline after Thomas Bayes. The original theorem proven by Bayes was popularized by Laplace in arguments about the problem of induction, and so became known as Laplace's Rule of Succession.


Comments

Jaime Sevilla Molina

For example, Laplace used the rule to estimate a probability of the sun rising tomorrow, given that it had risen every day for the past 5000 years, and arrived at odds of around 1826251:1\. But today when we have physical knowledge of the Sun's operation, not every possible 'rate at which the Sun rises each day' is undistinguished\. Furthermore, even in Laplace's time, he should have perhaps thought it especially likely a priori that "the Sun always rises" and "the Sun never rises" were distinguished as unusually likely frequencies of the Sun rising, a priori\.

This is redacted in a very confusing way.

Eyal Roth

The Rule of Succession follows from assuming approximate ignorance about prior frequencies\. It does not, of itself, justify this assumption\. Variations of the rule of succession are obtainable by taking different priors, corresponding to different views of what should count as uninformative\. See discussion on the Wikipedia page on non\-informative priors\. For example if starting with the Jeffreys' prior, then after observing $~$M$~$ heads and $~$N$~$ tails, the expected probability of heads on the next coinflip is:

Broken link :(