Likelihood

https://arbital.com/p/bayesian_likelihood

by Nate Soares Jul 7 2016 updated Oct 8 2016


[summary: "Likelihood", when speaking of Bayesian reasoning, denotes the probability of an observation, supposing some hypothesis to be correct.

Suppose our piece of evidence $~$e$~$ is that "Mr. Boddy was shot." One of our suspects is Miss Scarlett, and we denote by $~$H_S$~$ the hypothesis that Miss Scarlett shot Mr. Boddy. Suppose that if Miss Scarlett were the killer, we'd have predicted in advance a 20% probability she would use a gun, and an 80% chance she'd use some other weapon.

Then the likelihood from the evidence, to Miss Scarlett being the killer, is 0.20. Using conditional probability notation, $~$\mathbb P(e \mid H_S) = 0.20.$~$

This doesn't mean Miss Scarlett has a 20% chance of being the killer; it means that if she is the killer, our observation had a probability of 20%.

Relative likelihoods are a key ingredient for Bayesian reasoning and one of the quantities plugged into Bayes's Rule.]

Consider a piece of evidence $~$e,$~$ such as "Mr. Boddy was shot." We might have a number of different hypotheses that explain this evidence, including $~$H_S$~$ = "Miss Scarlett killed him", $~$H_M$~$ = "Colonel Mustard killed him", and so on.

Each of those hypotheses assigns a different probability to the evidence. For example, imagine that if Miss Scarlett were the killer, there's a 20% chance she would use a gun, and an 80% chance she'd use some other weapon. In this case, the "Miss Scarlett" hypothesis assigns a likelihood of 20% to $~$e.$~$

When reasoning about different hypotheses using a [-probability_distribution probability distribution] $~$\mathbb P$~$, the likelihood of evidence $~$e$~$ given hypothesis $~$H_i$~$ is often written using the conditional probability $~$\mathbb P(e \mid H_i).$~$ When reporting likelihoods of many different hypotheses at once, it is common to use a [-likelihood_function,] sometimes written [51n $~$\mathcal L_e(H_i)$~$].

Relative likelihoods measure the degree of support that a piece of evidence $~$e$~$ provides for different hypotheses. For example, let's say that if Colonel Mustard were the killer, there's a 40% chance he would use a gun. Then the absolute likelihoods of $~$H_S$~$ and $~$H_M$~$ are 20% and 40%, for relative likelihoods of (1 : 2). This says that the evidence $~$e$~$ supports $~$H_M$~$ twice as much as it supports $~$H_S,$~$ and that the amount of support would have been the same if the absolute likelihoods were 2% and 4% instead.

According to Bayes' rule, relative likelihoods are the appropriate tool for measuring the strength of a given piece evidence. Relative likelihoods are one of two key constituents of belief in [bayesian_reasoning Bayesian reasoning], the other being prior probabilities.

While absolute likelihoods aren't necessary when updating beliefs by Bayes' rule, they are useful when checking for confusion. For example, say you have a coin and only two hypotheses about how it works: $~$H_{0.3}$~$ = "the coin is random and comes up heads 30% of the time", and $~$H_{0.9}$~$ = "the coin is random and comes up heads 90% of the time." Now let's say you toss the coin 100 times, and observe the data HTHTHTHTHTHTHTHT… (alternating heads and tails). The relative likelihoods strongly favor $~$H_{0.3},$~$ because it was less wrong. However, the absolute likelihood of $~$H_{0.3}$~$ will be much lower than expected, and this deficit is a hint that $~$H_{0.3}$~$ isn't right. (For more on this idea, see Strictly confused.)