$$~$ \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}} $~$$

[summary: $$~$ \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}} $~$$

Say $~$A$~$ and $~$B$~$ are independent [event_probability events], so $~$\bP(A, B) = \bP(A)\bP(B).$~$ Then we can draw their joint probability distribution using the using the square visualization of probabilities:

]

This is what independence looks like, using the square visualization of probabilities:

We can see that the [event_probability events] $~$A$~$ and $~$B$~$ don't interact; we say that $~$A$~$ and $~$B$~$ are *independent*. Whether we look at the whole square, or just the red part of
the square where $~$A$~$ is true, the probability of $~$B$~$ stays the same. In other words, $~$\bP(B \mid A) = \bP(B)$~$. That's what we mean by independence: the
probability of $~$B$~$ doesn't change if you condition on $~$A$~$.

Our square of probabilities can be generated by multiplying together the probability of $~$A$~$ and the probability of $~$B$~$:

This picture demonstrates another way to define what it means for $~$A$~$ and $~$B$~$ to be independent:

$$~$\bP(A, B) = \bP(A)\bP(B)\ .$~$$

## In terms of factoring a joint distribution

Let's contrast independence with non-independence. Here's a picture of two ordinary, non-independent events $~$A$~$ and $~$B$~$:

(If the meaning of this picture isn't clear, take a look at Square visualization of probabilities on two events.)

We have the red blocks for $~$\bP(A)$~$ and the blue blocks for $~$\bP(\neg A)$~$ lined up in columns. This means we've [factoring_probability factored] our probability distribution using $~$A$~$ as the first factor:

$$~$\bP(A,B) = \bP(A) \bP(B \mid A)\ .$~$$

We could just as well have factored by $~$B$~$ first: $~$\bP(A,B) = \bP(B) \bP( A \mid B)\ .$~$ Then we'd draw a picture like this:

Now, here again is the picture of two independent events $~$A$~$ and $~$B$~$:

In this picture, there's red and blue lined-up columns for $~$\bP(A)$~$ and $~$\bP(\neg A)$~$, and there's *also* dark and light lined-up rows for $~$\bP(B)$~$ and
$~$\bP(\neg B)$~$. It looks like we somehow [factoring_probability factored] our probability distribution $~$\bP$~$ using both $~$A$~$ and
$~$B$~$ as the first factor.

In fact, this is exactly what happened: since $~$A$~$ and $~$B$~$ are independent, we have that $~$\bP(B \mid A) = \bP(B)$~$. So the diagram above is actually factored according to $~$A$~$ first: $~$\bP(A,B) = \bP(A) \bP(B \mid A)$~$. It's just that $~$\bP(B \mid A)= \bP(B) = \bP(B \mid \neg A)$~$, since $~$B$~$ is independent from $~$A$~$. So we don't need to have different ratios of dark to light (a.k.a. conditional probabilities of $~$B$~$) in the left and right columns:

In this visualization, we can see what happens to the probability of $~$B$~$ when you condition on $~$A$~$ or on $~$\neg A$~$: it doesn't change at all. The ratio of [the area where $~$B$~$ happens] to [the whole area], is the same as the ratio $~$\bP(B \mid A)$~$ where we only look at the area where $~$A$~$ happens, which is the same as the ratio $~$\bP(B \mid \neg A)$~$ where we only look at the area where $~$\neg A$~$ happens. The fact that the probability of $~$B$~$ doesn't change when we condition on $~$A$~$ is exactly what we mean when we say that $~$A$~$ and $~$B$~$ are independent.

The square diagram above is *also* factored according to $~$B$~$ first, using $~$\bP(A,B) = \bP(B) \bP(A \mid B)$~$. The red / blue ratios are the same in both rows
because $~$\bP(A \mid B) = \bP(A) = \bP(A \mid \neg B)$~$, since $~$A$~$ and $~$B$~$ are independent:

We couldn't do any of this stuff if the columns and rows didn't both line up. (Which is good, because then we'd have proved the false statement that any two events are independent!)

## In terms of multiplying marginal probabilities

Another way to say that $~$A$~$ and $~$B$~$ are independent variables %note:We're using the [event_variable_equivalence equivalence] between [event_probability events] and [binary_random_variable binary variables].% is that for any truth values $~$t_A,t_B \in \{\true, \false\},$~$

$$~$\bP(A = t_A, B= t_B) = \bP(A = t_A)\bP(B = t_B)\ .$~$$

So the joint probabilities for $~$A$~$ and $~$B$~$ are computed by separately getting the probability of $~$A$~$ and the probability of $~$B$~$, and then multiplying the two probabilities together. For example, say we want to compute the probability $~$\bP(A, \neg B) = \bP(A = \true, B = \false)$~$. We start with the [marginal_probability marginal probability] of $~$A$~$:

and the probability of $~$\neg B$~$:

and then we multiply them:

We can get all the joint probabilities this way. So we can visualize the whole joint distribution as the thing that you get when you multiply two independent probability distributions together. We just overlay the two distributions:

To be a little more mathematically elegant, we'd use the [topological_product topological product of two spaces] shown earlier to draw the joint distribution as a product of the distributions of $~$A$~$ and $~$B$~$: