"I believe it is essential to explain why it is ..."


by Eyal Roth Mar 19 2019 updated Mar 19 2019

In our example with the coins in the bathtub, the likelihoods of the evidence were independent on each step \- assuming a coin to be fair, it's no more or less likely to produce heads on the second flip after producing heads on the first flip\. So in our bathtub\-coins example, the Naive Bayes assumption was actually true\.

I believe it is essential to explain why it is independent in the case of the bathtub example and not in the other examples.

In the bathtub example, the evidence presents an event which is directly described by the assessed trait; i.e, the fairness of a coin is directly concerned with the appearance of either heads or tails. In contrast, the definition of the degree of "spamness" in an email is not directly concerned with the appearance of a word in the email, but is rather concerned with the abstract concept of the meaning a person assigns to the email.

The appearance of a word in an email is hence only an attempt of estimating the degree of "spamness", a proxy. In the case of a proxy, we need to consider the option that the proxy is flawed in a way which makes it so that the evidences are in fact dependencies of one another. This is not necessarily true, but it is possible, unlike in the case of hypothetical coins (in reality, a coin toss might actually be physically affected by the previous toss).