Introduction to Bayes' rule: Odds form

https://arbital.com/p/bayes_rule_odds_intro

by Eliezer Yudkowsky Feb 8 2016 updated Oct 26 2016

Bayes' rule is simple, if you think in terms of relative odds.


[summary: The odds form of Bayes' rule is a form of Bayes' rule that uses relative likelihoods and Odds. It says that in general, the prior odds times the likelihood ratio equals the posterior odds.

Suppose that the odds of a particular patient being sick are about $~$(1 : 4),$~$ i.e., they may be drawn from a population where there is 1 sick patient for every 4 healthy ones. Suppose we administer a test which comes back positive 90% of the time if the patient is actually sick, but also comes back positive 30% of the time even when the patient is healthy; this means relative likelihoods of $~$(90 : 30) = (3 : 1)$~$: the test is 3x as likely to come back positive if the patient is sick. Bayes' rule then says that the odds of the patient being sick after observing a positive test are $~$(1 : 4) \times (3 : 1) = (3 : 4),$~$ for a final probability of about $~$3/(3+4) \approx 43\%.$~$]

%%!if-after(Waterfall diagrams and relative odds): This introduction is meant to be read after the introductions to frequency visualizations and waterfall visualizations. %%

In general, Bayes' rule states:

$$~$ \textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$~$$

If we consider the waterfall visualization of the Diseasitis example, then we can visualize how relative odds are appropriate for thinking about the two rivers at the top of the waterfall.

Waterfall visualization

The proportion of red vs. blue water at the bottom will be the same whether there's 200 vs. 800 gallons per second of red vs. blue water at the top of the waterfall, or 20,000 vs. 80,000 gallons/sec, or 1 vs. 4 gallons/second. So long as the rest of the waterfall behaves in a proportional way, we'll get the same proportion of red vs blue at the bottom. Thus, we're justified in ignoring the amount of water and considering only the relative proportion between amounts.

Similarly, what matters is the relative proportion between how much of each gallon of red water makes it into the shared pool, and how much of each gallon of blue water, makes it. 45% and 15% of the red and blue water making it to the bottom would give the same relative proportion of red and blue water in the bottom pool as 90% and 30%.

Changing the proportion makes no difference

This justifies throwing away the specific data that 90% of the red stream and 30% of the blue stream make it down, and summarizing this into relative likelihoods of (3 : 1).

More generally, suppose we have a medical test that detects a sickness with a 90% true positive rate (10% false negatives) and a 30% false positive rate (70% true negatives). A positive result on this test represents the same strength of evidence as a test with 60% true positives and 20% false positives. A negative result on this test represents the same strength of evidence as a test with 9% false negatives and 63% true negatives.

In general, the strength of evidence is summarized by how relatively likely different states of the world make our observations. %%!if-before(Bayes' rule: Log-odds form): For more on this idea, see Strength of Bayesian evidence. %% %%if-before(Bayes' rule: Log-odds form): More on this later. %%

The equation

To state Bayes' rule in full generality, and prove it as a theorem, we'll need to introduce some new notation.

Conditional probability

First, when $~$X$~$ is a proposition, $~$\mathbb P(X)$~$ will stand for the probability of $~$X.$~$

In other words, $~$X$~$ is something that's either true or false in reality, but we're uncertain about it, and $~$\mathbb P(X)$~$ is a way of expressing our degree of belief that $~$X$~$ is true. A patient is, in fact, either sick or healthy; but if you don't know which of these is the case, the evidence might lead you to assign a 43% subjective probability that the patient is sick.

$~$\mathbb \neg X$~$ will mean "$~$X$~$ is false", so $~$\mathbb P(\neg X)$~$ is the "the probability $~$X$~$ is false".

The Diseasitis involved some more complicated statements than this, though; in particular it involved:

In these cases we want to go from some fact that is assumed or known to be true (on the right), to some other proposition (on the left) whose new probability we want to ask about, taking into account that assumption.

Probability statements like those are known as "conditional probabilities". The standard notation for conditional probability expresses the above quantities as:

This standard notation for $~$\mathbb P(X \mid Y)$~$ meaning "the probability of $~$X$~$, assuming $~$Y$~$ to be true" is a helpfully symmetrical vertical line, to avoid giving you any visual clue to remember that the assumption is on the right and the inferred proposition is on the left.

Conditional probability is defined as follows. Using the notation $~$X \wedge Y$~$ to denote "X and Y" or "both $~$X$~$ and $~$Y$~$ are true":

$$~$\mathbb P(X \mid Y) := \frac{\mathbb P(X \wedge Y)}{\mathbb P(Y)}$~$$

E.g. in the Diseasitis example, $~$\mathbb P(sick \mid blackened)$~$ is calculated by dividing the 18% students who are sick and have blackened tongue depressors ($~$\mathbb P(sick \wedge blackened)$~$), by the total 42% students who have blackened tongue depressors ($~$\mathbb P(blackened)$~$).

Or $~$\mathbb P(blackened \mid \neg sick),$~$ the probability of blackening the tongue depressor given that you're healthy, is equivalent to the 24 students who are healthy and have blackened tongue depressors, divided by the 80 students who are healthy. 24 / 80 = 3/10, so this corresponds to the 30% false positives we were told about at the start.

We can see the law of conditional probability as saying, "Let us restrict our attention to worlds where $~$Y$~$ is the case, or thingies of which $~$Y$~$ is true. Looking only at cases where $~$Y$~$ is true, how many cases are there inside that restriction where $~$X$~$ is also true - cases with $~$X$~$ and $~$Y$~$?"

For more on this, see Conditional probability.

Bayes' rule

Bayes' rule says:

$$~$\textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$~$$

In the Diseasitis example, this would state:

$$~$\dfrac{\mathbb P({sick})}{\mathbb P(healthy)} \times \dfrac{\mathbb P({blackened}\mid {sick})}{\mathbb P({blackened}\mid healthy)} = \dfrac{\mathbb P({sick}\mid {blackened})}{\mathbb P(healthy\mid {blackened})}.$~$$

%todo: apparently the parallel is not super obvious, and maybe we can use slightly different colors in the text to make it clearer that e.g. Prior odds -> sick/healthy%

The prior odds refer to the relative proportion of sick vs healthy patients, which is $~$1 : 4$~$. Converting these odds into probabilities gives us $~$\mathbb P(sick)=\frac{1}{4+1}=\frac{1}{5}=20\%$~$.

The Relative likelihood refers to how much more likely each sick patient is to get a positive test result than each healthy patient, which (using conditional probability notation) is $~$\frac{\mathbb P(positive \mid sick)}{\mathbb P(positive \mid healthy)}=\frac{0.90}{0.30},$~$ aka relative likelihoods of $~$3 : 1.$~$

The posterior odds are the relative proportions of sick vs healthy patients among those with positive test results, or $~$\frac{\mathbb P(sick \mid positive)}{\mathbb P(healthy \mid positive)} = \frac{3}{4}$~$, aka $~$3 : 4$~$ odds.

To extract the probability from the relative odds, we keep in mind that probabilities of mutually exclusive and exhaustive propositions need to sum to $~$1,$~$ that is, there is a 100% probability of something happening. Since everyone is either sick or not sick, we can normalize the odd ratio $~$3 : 4$~$ by dividing through by the sum of terms:

$$~$(\frac{3}{3+4} : \frac{4}{3+4}) = (\frac{3}{7} : \frac{4}{7}) \approx (0.43 : 0.57)$~$$

…ending up with the probabilities (0.43 : 0.57), proportional to the original ratio of (3 : 4), but summing to 1. It would be very odd if something had probability $~$3$~$ (300% probability) of happening.

Using the waterfall visualization:

labeled waterfall

We can generalize this to any two hypotheses $~$H_j$~$ and $~$H_k$~$ with evidence $~$e$~$, in which case Bayes' rule can be written as:

$$~$\dfrac{\mathbb P(H_j)}{\mathbb P(H_k)} \times \dfrac{\mathbb P(e \mid H_j)}{\mathbb P(e \mid H_k)} = \dfrac{\mathbb P(H_j \mid e)}{\mathbb P(H_k \mid e)}$~$$

which says "the posterior odds ratio for hypotheses $~$H_j$~$ vs $~$H_k$~$ (after seeing the evidence $~$e$~$) are equal to the prior odds ratio times the ratio of how well $~$H_j$~$ predicted the evidence compared to $~$H_k.$~$"

If $~$H_j$~$ and $~$H_k$~$ are Mutually exclusive and exhaustive, we can convert the posterior odds into a posterior probability for $~$H_j$~$ by normalizing the odds - dividing through the odds ratio by the sum of its terms, so that the elements of the new ratio sum to $~$1.$~$

Proof of Bayes' rule

Rearranging the definition of conditional probability, $~$\mathbb P(X \wedge Y) = \mathbb P(Y) \cdot \mathbb P(X|Y).$~$ E.g. to find "the fraction of all patients that are sick and get a positive result", we multiply "the fraction of patients that are sick" times "the probability that a sick patient blackens the tongue depressor".

Then this is a proof of Bayes' rule:

$$~$ \frac{\mathbb P(H_j)}{\mathbb P(H_k)} \cdot \frac{\mathbb P(e_0 | H_j)}{\mathbb P(e_0 | H_k)} = \frac{\mathbb P(e_0 \wedge H_j)}{\mathbb P(e_0 \wedge H_k)} = \frac{\mathbb P(H_j \wedge e_0)/\mathbb P(e_0)}{\mathbb P(H_k \wedge e_0)/\mathbb P(e_0)} = \frac{\mathbb P(H_j | e_0)}{\mathbb P(H_k | e_0)} $~$$

QED.

In the Diseasitis example, these proof steps correspond to the operations:

$$~$ \frac{0.20}{0.80} \cdot \frac{0.90}{0.30} = \frac{0.18}{0.24} = \frac{0.18/0.42}{0.24/0.42} = \frac{0.43}{0.57} $~$$

Using red for sick, blue for healthy, grey for a mix of sick and healthy patients, and + signs for positive test results, the calculation steps can be visualized as follows:

bayes venn

%todo: maybe replace this diagram with pie-chart circles in exactly right proportions (but still with the correct populations of + signs)%

This process of observing evidence and using its likelihood ratio to transform a prior belief into a posterior belief is called a "Bayesian update" or "belief revision."

%%if-before(Extraordinary claims require extraordinary evidence): Congratulations! You now know (we hope) what Bayes' rule is, and how to apply it to simple setups. After this, the path continues with further implications %if-before(Bayes' rule: Vector form): and additional forms% of Bayes' rule. This might be a good time to take a break, if you want one--but we hope you continue on this Arbital path after that! %%

%%!if-before(Bayes' rule: Vector form):


Comments

Eric Rogstad

The calculation we did earlier corresponds to claiming:

Which calculation?

Tom Voltz

I seem to have broken the display by proposing an edit! The meta-level script is showing in some places. I hope that doesn't cause unneeded headaches.

I only wanted to emphasize the difference in notation between a horizontal line ("---", as in relative probabilities) vs. a forward slash ("/", as in probability that something will occur). I could find no misuse of the notation when I re-read the page, but it was a bit confusing for me jumping in at the stage I did (is there an earlier page briefly defining various notation?), since I am accustomed to both these symbols meaning "divided by", which lead me to instinctively calculate a percentage or fraction whenever I see one or the other. This could be an idiosyncrasy of an engineering workplace.

Alexei Andreev

Rearranging the definition of conditional probability, $~$\\mathbb P(X \\wedge Y) \= \\mathbb P(Y) \\cdot \\mathbb P(X|Y).$~$ E\.g\. to find "the fraction of all patients that are sick and get a positive result", we multiply "the fraction of patients that are sick" times "the probability that a sick patient blackens the tongue depressor"\.

test

Anareth A

Rearranging the definition of conditional probability, $~$\\mathbb P(X \\wedge Y) \= \\mathbb P(Y) \\cdot \\mathbb P(X|Y).$~$ E\.g\. to find "the fraction of all patients that are sick and get a positive result", we multiply "the fraction of patients that are sick" times "the probability that a sick patient blackens the tongue depressor"\.

The following would be simpler and more consistent with the beginning of the sentence: “the fraction of sick patients that got a positive result”

Anareth A

Rearranging the definition of conditional probability, $~$\\mathbb P(X \\wedge Y) \= \\mathbb P(Y) \\cdot \\mathbb P(X|Y).$~$ E\.g\. to find "the fraction of all patients that are sick and get a positive result", we multiply "the fraction of patients that are sick" times "the probability that a sick patient blackens the tongue depressor"\.

“got” would be clearer.

81y

This standard notation for $~$\\mathbb P(X \\mid Y)$~$ meaning "the probability of $~$X$~$, assuming $~$Y$~$ to be true" is a helpfully symmetrical vertical line, to avoid giving you any visual clue to remember that the assumption is on the right and the inferred proposition is on the left\.

This confused me at first because I didn't realize it was sarcasm and I thought I was missing something. "Is there any reason why distinguishing between assumption and proposition is a bad idea?"

Stephanie Koo

E\.g\. in the Diseasitis example, $~$\\mathbb P(sick \\mid blackened)$~$ is calculated by dividing the 18% students who are sick and have blackened tongue depressors \($~$\\mathbb P(sick \\wedge blackened)$~$\), by the total 42% students who have blackened tongue depressors \($~$\\mathbb P(blackened)$~$\)\.

Just reiterating that it's 18% of all students (sick and healthy). That's because it's a 90% (0.9) chance the blackened tongue depressor belonged to a sick student, out of all the sick students (20% of total student population).

Sorry if this is really obvious to others, it just took me a while.

Haakon Borch

More generally, suppose we have a medical test that detects a sickness with a 90% true positive rate \(10% false negatives\) and a 30% false positive rate \(70% true negatives\)\. A positive result on this test represents the same strength of evidence as a test with 60% true positives and 20% false positives\. A negative result on this test represents the same strength of evidence as a test with 9% false negatives and 63% true negatives\.

It was unclear when reading this which test "this test" referred to. I ended up figuring out the false negatives and true positives of the 60/20 test instead of the 90/30 test and was subsequently confused because 1:2 != 1:7. This might be an issue with my reading comprehension, but I figured I should mention it anyway.

yassine chaouche

More generally, suppose we have a medical test that detects a sickness with a 90% true positive rate \(10% false negatives\) and a 30% false positive rate \(70% true negatives\)\. A positive result on this test represents the same strength of evidence as a test with 60% true positives and 20% false positives\. A negative result on this test represents the same strength of evidence as a test with 9% false negatives and 63% true negatives\.

Would that mean that the strength of evidence is the TP/FP ratio ? in that case, it would have the same definition as the relative likelihood. Wouldn't there be a better definition for either one of the notions so that we can easily differentiate them ?

ubs izo

I'd really like to see links to problems or sums at each level, i feel like a single or two worked out examples is not enough, and that say ten problems that help one think this idea and connected ideas through would be great.

Dewi Morgan

First, when $~$X$~$ is a proposition, $~$\\mathbb P(X)$~$ will stand for the probability of $~$X.$~$

In this page, the terms "probability" and "odds" are used in the statistical sense of "In the classical and canonical representation of probability, 0 expresses absolute incredulity, and 1 expresses absolute credulity." (from the linked definition) and "odds are a ratio of desired outcomes vs the field" (has no linked definition, I'm just wildly guessing based on context).

Explaining this distinction clearly at the outset for non-statistically trained users, may be worthwhile.

Explaining what is meant by odds, on this page about them, may be worthwhile.

It may also be confusing to a new reader, who has just read a linked definition which explains that probabilities are expressed as a number from 0 to 1, to see it expressed in the very same paragraph (and elsewhere on the page) as a percentage instead. I feel that this is a lack in the definition, though, rather than a problematic inconsistency on the page: I find the expression both as 0-1 and 0%-100% meant I looked at the problem from both points of view, and so felt it had a firmer grasp on the concepts.

Dewi Morgan

The 90% chance that a patient blackens the tongue depressor, given that they have Diseasitis\. The 30% chance that a patient blackens the tongue depressor, given that they're healthy\. The 3/7 chance that a patient has Diseasitis, given that they blackened the tongue depressor\.

Is this a probability or an odd? What's a "chance"? In this list, "chance"s are expressed both as a fraction, and as a percentage, like some kind of hybrid probablodd. This feels like muddying the waters. When you're introducing a new concept like "probability and odds, while synonyms in lay speak, are different things in statistics", it's probably not good to conflate both with another lay synonym like "chance".