Bayes' rule is the mathematics of probability theory governing how to update your beliefs in the light of new evidence.

[toc:]

## Notation

In much of what follows, we'll use the following notation:

- Let the hypotheses being considered be $~$H_1$~$ and $~$H_2$~$.
- Let the evidence observed be $~$e_0.$~$
- Let $~$\mathbb P(H_i)$~$ denote the prior probability of $~$H_i$~$ before observing the evidence.
- Let the conditional probability $~$\mathbb P(e_0\mid H_i)$~$ denote the likelihood of observing evidence $~$e_0$~$ assuming $~$H_i$~$ to be true.
- Let the conditional probability $~$\mathbb P(H_i\mid e_0)$~$ denote the posterior probability of $~$H_i$~$ after observing $~$e_0.$~$

## Odds/proportional form

Bayes' rule in the odds form or proportional form states:

$$~$\dfrac{\mathbb P(H_1)}{\mathbb P(H_2)} \times \dfrac{\mathbb P(e_0\mid H_1)}{\mathbb P(e_0\mid H_2)} = \dfrac{\mathbb P(H_1\mid e_0)}{\mathbb P(H_2\mid e_0)}$~$$

In other words, the prior odds times the likelihood ratio yield the posterior odds. Normalizing these odds will then yield the posterior probabilities.

In other other words: If you initially think $~$h_i$~$ is $~$\alpha$~$ times as probable as $~$h_k$~$, and then see evidence that you're $~$\beta$~$ times as likely to see if $~$h_i$~$ is true as if $~$h_k$~$ is true, you should update to thinking that $~$h_i$~$ is $~$\alpha \cdot \beta$~$ times as probable as $~$h_k.$~$

Suppose that Professor Plum and Miss Scarlet are two suspects in a murder, and that we start out thinking that Professor Plum is twice as likely to have committed the murder as Miss Scarlet (prior odds of 2 : 1). We then discover that the victim was poisoned. We think that Professor Plum is around one-fourth as likely to use poison as Miss Scarlet (likelihood ratio of 1 : 4). Then after observing the victim was poisoned, we should think Plum is around half as likely to have committed the murder as Scarlet: $~$2 \times \dfrac{1}{4} = \dfrac{1}{2}.$~$ This reflects posterior odds of 1 : 2, or a posterior probability of 1/3, that Professor Plum did the deed.

## Proof

The proof of Bayes' rule is by the definition of conditional probability $~$\mathbb P(X\wedge Y) = \mathbb P(X\mid Y) \cdot \mathbb P(Y):$~$

$$~$ \dfrac{\mathbb P(H_i)}{\mathbb P(H_j)} \times \dfrac{\mathbb P(e\mid H_i)}{\mathbb P(e\mid H_j)} = \dfrac{\mathbb P(e \wedge H_i)}{\mathbb P(e \wedge H_j)} = \dfrac{\mathbb P(e \wedge H_i) / \mathbb P(e)}{\mathbb P(e \wedge H_j) / \mathbb P(e)} = \dfrac{\mathbb P(H_i\mid e)}{\mathbb P(H_j\mid e)} $~$$

## Log odds form

The log odds form of Bayes' rule states:

$$~$\log \left ( \dfrac {\mathbb P(H_i)} {\mathbb P(H_j)} \right ) + \log \left ( \dfrac {\mathbb P(e\mid H_i)} {\mathbb P(e\mid H_j)} \right ) = \log \left ( \dfrac {\mathbb P(H_i\mid e)} {\mathbb P(H_j\mid e)} \right ) $~$$

E.g.: "A study of Chinese blood donors found that roughly 1 in 100,000 of them had HIV (as determined by a very reliable gold-standard test). The non-gold-standard test used for initial screening had a sensitivity of 99.7% and a specificity of 99.8%, meaning that it was 500 times as likely to return positive for infected as non-infected patients." Then our prior belief is -5 orders of magnitude against HIV, and if we then observe a positive test result, this is evidence of strength +2.7 orders of magnitude for HIV. Our posterior belief is -2.3 orders of magnitude, or odds of less than 1 to a 100, against HIV.

In log odds form, the same strength of evidence (log likelihood ratio) always moves us the same additive distance along a line representing strength of belief (also in log odds). If we measured distance in probabilities, then the same 2 : 1 likelihood ratio might move us a different distance along the probability line depending on whether we started with prior 10% probability or 50% probability.

## Visualizations

Graphical of visualizing Bayes' rule include frequency diagrams, the waterfall visualization, the spotlight visualization, the magnet visualization, and the Venn diagram for the proof.

## Examples

Examples of Bayes' rule may be found here.

## Multiple hypotheses and updates

The odds form of Bayes' rule works for odds ratios between more than two hypotheses, and applying multiple pieces of evidence. Suppose there's a bathtub full of coins. 1/2 of the coins are "fair" and have a 50% probability of producing heads on each coinflip; 1/3 of the coins produce 25% heads; and 1/6 produce 75% heads. You pull out a coin at random, flip it 3 times, and get the result HTH. You may legitimately calculate:

$$~$\begin{array}{rll} (1/2 : 1/3 : 1/6) \cong & (3 : 2 : 1) & \\ \times & (2 : 1 : 3) & \\ \times & (2 : 3 : 1) & \\ \times & (2 : 1 : 3) & \\ = & (24 : 6 : 9) & \cong (8 : 2 : 3) \end{array}$~$$

Since multiple pieces of evidence may not be [conditional_independence conditionally independent] from one another, it is important to be aware of the [naive_bayes_assumption Naive Bayes assumption] and whether you are making it.

## Probability form

As a formula for a single probability $~$\mathbb P(H_i\mid e),$~$ Bayes' rule states:

$$~$\mathbb P(H_i\mid e) = \dfrac{\mathbb P(e\mid H_i) \cdot \mathbb P(H_i)}{\sum_k \mathbb P(e\mid H_k) \cdot \mathbb P(H_k)}$~$$

## Functional form

In functional form, Bayes' rule states:

$$~$\mathbb P(\mathbf{H}\mid e) \propto \mathbb P(e\mid \mathbf{H}) \cdot \mathbb P(\mathbf{H}).$~$$

The posterior probability function over hypotheses given the evidence, is *proportional* to the likelihood function from the evidence to those hypotheses, times the prior probability function over those hypotheses.

Since posterior probabilities over mutually exclusive and exhaustive possibilities must sum to $~$1,$~$ normalizing the product of the likelihood function and prior probability function will yield the exact posterior probability function.

## Comments

Eric Rogstad

It's not totally clear what the antecedent of this "it's" is. (Because "it's" often means "it is the case that")

Eric Rogstad

Too Eliezer-voice. What would Sal Khan say?