Normalization (probability)

https://arbital.com/p/normalize_probabilities

by Eliezer Yudkowsky Jan 27 2016 updated Oct 7 2016

That thingy we do to make sure our probabilities sum to 1, when they should sum to 1.


[summary: "Normalization" obtains a set of probabilities summing to 1, in cases where they ought to sum to 1. We do this by dividing each pre-normalized number by the sum of all pre-normalized numbers.

Suppose the odds of Alexander Hamilton winning an election are 3 : 2. We think the proportions are right (Alexander is 1.5 times as likely to win as not win) but we want probabilities. To say that Hamilton has probability 3 of winning the election would be very strange indeed. But if we divide each of the terms by the sum of all the terms, they'll end up summing to one: $~$3:2 \cong \frac{3}{3+2} : \frac{2}{3+2} = 0.6 : 0.4.$~$ Thus, the probability that Hamilton wins is 60%.]

"Normalization" is an arithmetical procedure carried out to obtain a set of probabilities summing to exactly 1, in cases where we believe that exactly one of the corresponding possibilities is true, and we already know the relative probabilities.

For example, suppose that the odds of Alexander Hamilton winning a presidential election are 3 : 2. But Alexander Hamilton must either win or not win, so the probabilities of him winning or not winning should sum to 1. If we just add 3 and 2, however, we get 5, which is an unreasonably large probability.

If we rewrite the odds as 0.6 : 0.4, we've preserved the same proportions, but made the terms sum to 1. We therefore calculate that Hamilton has a 60% probability of winning the election.

We normalized those odds by dividing each of the terms by the sum of terms, i.e., went from 3 : 2 to $~$\frac{3}{3+2} : \frac{2}{3+2} = 0.6 : 0.4.$~$

In converting the odds $~$m : n$~$ to $~$\frac{m}{m+n} : \frac{n}{m+n},$~$ the factor $~$\frac{1}{m+n}$~$ by which we multiply all elements of the ratio is called a normalizing constant.

More generally, if we have a relative-odds function $~$\mathbb{O}(H)$~$ where $~$H$~$ has many components, and we want to convert this to a probability function $~$\mathbb{P}(H)$~$ that sums to 1, we divide every element of $~$\mathbb{O}(H)$~$ by the sum of all elements in $~$\mathbb{O}(H).$~$ That is:

$~$\mathbb{P}(H_i) = \frac{\mathbb{O}(H_i)}{\sum_i \mathbb{O}(H_i)}$~$

Analogously, if $~$\mathbb{O}(x)$~$ is a continuous distribution on $~$X$~$, we would normalize it (create a proportional probability function $~$\mathbb{P}(x)$~$ whose integral is equal to 1) by dividing $~$\mathbb{O}(x)$~$ by its own integral:

$~$\mathbb{P}(x) = \frac{\mathbb{O}(x)}{\int \mathbb{O}(x) \operatorname{d}x}$~$

In general, whenever a probability function on a variable is proportional to some other function, we can obtain the probability function by normalizing that function:

$~$\mathbb{P}(H) \propto \mathbb{O}(H) \implies \mathbb{P}(H) = \frac{\mathbb{O}(H)}{\sum \mathbb{O}(H)}$~$