Joint probability distribution: (Motivation) coherent probabilities

https://arbital.com/p/468

by Tsvi BT Jun 11 2016 updated Jun 22 2016

If you don't use joint probability distributions, none of your probabilities will make any sense. So, yeah, use joint probability distributions.


$$~$ \newcommand{\gO}{\Omega} \newcommand{\go}{\omega} \newcommand{\bP}{\mathbb{P}} \newcommand{\pc}{0.4} \newcommand{\pnc}{0.6} \newcommand{\plc}{0.7} \newcommand{\fpcl}{0.9} \newcommand{\fpcnl}{0.2} \newcommand{\plnc}{0.2} \newcommand{\pscr}{0.4} \newcommand{\pscnr}{0.1} \newcommand{\psncr}{0.7} \newcommand{\psncnr}{0.9} \newcommand{\pjnclnrns}{0.0036} \newcommand{\pjclrs}{0.0784} \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} $~$$

Example: Medical Diagnosis

Imagine you're a doctor. Today's patient, Mr. WebMD, has a terrible cough; so naturally, he either has the flu or cancer. From past experience with similar cases, you assign some prior probability $~$\bP(C)$~$ to the [event_probability event] that Mr. W has cancer, and $~$\bP(\neg C) = 1-\bP(C)$~$ to the event that Mr. W doesn't have cancer (and does have the flu).

If he has cancer, then you place conditional probability $~$\bP(L\mid C) = \plc$~$ on finding a lump. If he doesn't have cancer and just has the flu, then you assign $~$\bP(L \mid \neg C) = \plnc$~$ to finding a lump.

You're going to observe whether Mr. W has a lump, and then you need to decide whether to treat him with radiation ($~$R$~$) or with $~$\text{Not Radiation}^{\text{TM}}$~$ ($~$\neg R$~$). Whether or not the patient survives ($~$S$~$) for at least a year depends on what disease he has, and what treatment you prescribe:

$$~$\bP(S \mid \;\; C, \;\; R) = \pscr$~$$

$$~$\bP(S \mid \;\; C, \neg R) = \pscnr$~$$

$$~$\bP(S \mid \neg C, \;\; R) = \psncr$~$$

$$~$\bP(S \mid \neg C, \neg R) = \psncnr$~$$

So, for instance, if Mr. W has cancer but you don't treat him with radiation, then you believe based on similar cases that he'll only survive with probability $~$\bP(S \mid C, \neg R) = \pscnr$~$.

Now, what do you do in this situation? How do you look at a bunch of conditional probabilities, and use them to figure out which disease Mr. W is likely to have and what treatment is likely to keep him alive?

For example, just looking at the probability table for $~$\bP(S \mid C,R)$~$, we can see that if we believe that Mr. W has cancer, then we want to treat him with radiation, and if he has the flu, then we do not. These are the choices that maximize the probability that Mr. W survives, conditioned on the disease and treatment.

So, we've got to guess whether or not Mr. W has cancer. Maybe we see a lump, which is a strong indication of cancer, so after we update our beliefs we have a conditional probability $~$\bP(C \mid L) = \fpcl$~$ of cancer. Since we now think cancer is likely, we treat with radiation. On the other hand, if we didn't see a lump, maybe cancer isn't very likely, so we have a conditional probability $~$\bP(C \mid \neg L) = \fpcnl$~$ of cancer, and we don't treat with radiation.

Epistemic Chaos!

The probabilities we assigned in the medical diagnosis situation might be appealing to those who wish to reason about an uncertain world in an organized way. We've written down lots of numbers that quantify the intuitive relationships between variables that we care about, like "cancer usually leads to lumps" ($~$\bP(L \mid C) = big$~$) or "if the patient has cancer, then survival is more likely if we treat with radiation than if we don't" ($~$\bP(S \mid C, R) >\bP(S \mid C, \neg R)$~$).

But there is a terrible problem lurking here.

A terrible problem: incoherent beliefs

You can get test subjects in a psychology experiment to say that "Linda is a bank teller and is active in the feminist movement" is more likely to be true than "Linda is a bank teller". (See conjunction fallacy.)

This is absurd: if Laura is a teller and an active feminist, then she is a teller; ($~$A$~$ and $~$B$~$) logically implies $~$A$~$. There is no [coherent_probability coherent] set of beliefs that could possibly lead to assigning probabilities so that $~$\bP(A,B) > \bP(A)$~$, since whenever $~$(A,B)$~$ happens, $~$A$~$ also happens.

So we can't just write down any old numbers to represent our uncertainty: our probabilities have to be [coherent_probability coherent].

Incoherence isn't always obvious…

It is a Serious Concern that our beliefs might be [incoherent_probability incoherent], and this won't work itself out automatically. In the medical example, we wrote down these conditional probabilities:

$$~$\bP(L\mid \;\; C) = \plc$~$$

$$~$\bP(L \mid \neg C) = \plnc$~$$

$$~$\bP(C \mid \;\; L) = \fpcl$~$$

$$~$\bP(C \mid \neg L) = \fpcnl$~$$

These probabilities can be justified with intuitive reasons. If Mr. W doesn't have cancer, he probably won't have a lump; if we see that Mr. W has a lump, then he probably has cancer.

Are you seriously concerned yet? If not, I hope the following diagrams will increase the seriousness of your concern:

we're using the [suqeare reperesnsiont] . we fix the conditionals L given C. free param pc determines everything. inf particul, detremines the p c|l or nl. but need different settings for the different pc|l:

but we don't know the prior p(c). as it slides, it fully specifies the distribution. so it gives probailties p(c|l). but the ones we wrote down happen at different p(c). so it is impossible.

Let's break this down. We're using the [probability_distribution_square_visualization square visualiation] of our probability distribution. The red regions are the regions where Mr. W has cancer, and the blue regions are where $~$\neg C$~$ is true. The darker regions are where $~$L$~$, and the lighter regions are where $~$\neg L$~$.

The proportion of the red column that is darker is

this goes up top. remove p(c). or rather, make theree: 1 with no pc, 1 for each pc for p(c|l). those 2 go after the spectrum $$~$\frac{\bP(L,C)}{\bP(C)} = \bP(L \mid C)\ ,$~$$

and

something something inchoroledent

…and incoherence is not at all obvious in real life.

Mr. W only had two possible diseases, one possible symptom, and one possible treatment. An actual diagnosis could involve many thousands of possible diseases, symptoms, and treatments.

In real life, how can we be sure that our beliefs are even coherent? If we write down a great big collection of probabilities that look like

$$~$\bP(\text{disease}_9 \mid \text{symptom}_2, \text{symptom}_5, \text{symptom}_{17})=0.153$~$$

or

$$~$\bP(\text{outcome} = \text{survival} \mid \text{disease}_9, \text{treatment} = \text{bezoar})=0.094,$~$$

what's to stop us from writing down something nonsensical like $~$\bP(A,B) > \bP(A)$~$?

What To Do?

In general, if we want to use probability theory to reason in uncertain situations, there will be lots and lots of variables that we care about. So it won't be obvious that our beliefs are coherent. And, if our beliefs aren't coherent, there's nothing to stop us from doing [incoherence_properties_probability all sorts of silly things].

If only there were some way to organize our uncertainty in a nice, systematic way that is guaranteed to be consistent.