Causal decision theories

by Eliezer Yudkowsky Jul 29 2016 updated Aug 2 2016

On CDT, to choose rationally, you should imagine the world where your physical act changes, then imagine running that world forward in time. (Therefore, it's irrational to vote in elections.)

[summary(Gloss): Causal decision theory says, "To choose rationally, imagine that just your own physical act changes, and ask what the resulting universe would look like." CDT is the implicit background theory being invoked when somebody suggests that voting in elections is 'irrational', or that two 'rational' agents can't help but defect against each other in the Prisoner's Dilemma."

Causal decision theory is currently (2016) the most academically popular decision theory. It is being challenged by Logical decision theories.]

[summary: Causal decision theory is a (currently academically dominant) view which says that the [rational_choice principle of rational choice] is to choose based on the [causal_counterfactual physical consequences] of your act. To figure out what the universe looks like if you do X, you should imagine a world where nothing changes up to the moment of X, the physical act changes to be X, and then your model of the world's laws is run forward from there.

CDT contrasts to evidential decision theory, the original form of decision theory, which was later realized to imply, "Act so that your decision is the best news you could get, if somebody told you about it as news." It also contrasts to the more recent Logical decision theories, which says, "Choose as if you are selecting the logical output of your decision algorithm."

Causal decision theory has been criticized on grounds of giving counterintuitive advice such as "Don't bother voting in elections" or "Defect against your own clone in the Prisoner's Dilemma", and for other agents getting higher payoffs in dilemmas such as Newcomb's Problem. Logical decision theorists also critique CDT on grounds such as its alleged reflective inconsistency.]

[summary(Technical): Causal decision theory says that the action-conditionals inside the expected utility formula should be treated as [causal_counterfactual causal counterfactuals] intervening on your physical act, that is, the expectation of utility $~$\mathcal U$~$ summed over outcomes $~$\mathcal O$~$ given an action $~$a_x$~$ should be written:

$$~$\mathbb E[\mathcal U|a_x] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(a_x \ \square \!\! \rightarrow o_i)$~$$

If causal counterfactuals are computed as in the standard theory of [causal_model causal models] using $~$operatorname{do}()$~$ interventions, then the expected utility formula would be written:

$$~$\mathbb E[\mathcal U| \operatorname{do}(a_x)] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(o_i | \operatorname{do}(a_x))$~$$

Some proposed technical refinements of CDT include [tickle_defense updating on your own suspicion of action]; and choosing mixed strategies to break infinite loops in dilemmas like Death in Damascus or the Absent-Minded Driver.

Logical decision theorists have critiqued standard causal decision theory upon several intuitive and technical grounds.]

Causal decision theory (CDT) and its many variants are the current academically dominant form of decision theory (as of 2016). CDT contrasts with the older formalism of evidential decision theory and, more recently, the new principle of logical decision theory. CDT says that "The [principle_rational_choice principle of rational choice] is to decide according to the [causal_counterfactual counterfactual] consequences of your physical act." That is: To figure out the consequences of a choice, imagine the universe as it will already exist at the moment when you physically act; imagine only your physical act changing; and then run the laws of physics forwards from there to figure out the consequences.

Other overviews of causal decision theory:

Causal versus evidential decision theory

Causal decision theory gained widespread acceptance based on critiques of the policies implied by the previous way of writing down the expected utility formula, which we now think of as evidential decision theory (EDT).

We can think of EDT as the accidental result of writing down expected utilities in the most obvious way: The expected consequence of an act $~$a_0$~$ is just the probability distribution over outcomes $~$o_i$~$ given by $~$\mathbb P(o_i|a_0).$~$ That is, on EDT, to imagine the consequence of choosing an act $~$a_0,$~$ we imagine what we would believe about the world if somebody told us that we'd actually chosen $~$a_0.$~$

To see on example of a case that pries apart EDT and CDT, consider the Toxoplasmosis Dilemma. Suppose that a certain parasitic infection, often carried by cats, has been found to make humans enjoy petting cats more (thus helping to spread the infection). Suppose that statistics have found that in a certain experimental setup, 10% of the people who don't pet a cute kitten, and 20% of the people who do pet the kitten, have toxoplasmosis. The kitten itself is guaranteed to have been sterilized and free of toxoplasmosis. The disutility of toxoplasmosis as a parasitic infection greatly outweighs the pleasure of petting the kitten. Do you pet the kitten?

An EDT agent might reason, "If I learn as news that I pet the kitten, I would estimate a 10% higher chance that I have toxoplasmosis, compared to the world in which I do not learn that I pet the kitten. Therefore, I will not pet the kitten."

A CDT agent would reason, "When I imagine the world up to the point where I pet the kitten, either I already have toxoplasmosis or I don't. Petting the kitten can't cause me to get toxoplasmosis. Therefore, I should pet the kitten… now, having realized that I intend to pet the kitten, I realize that I have a 20% chance of having toxoplasmosis. But in the counterfactual world where I don't pet the kitten, my probability of having toxoplasmosis would counterfactually still be 20%, and I'd miss out on petting the kitten as well."

Due to it being widely agreed that the CDT agent is being more reasonable in the above case, CDT was widely adopted as a replacement for the previous formalism that was then relabeled as EDT.

EDT and CDT are computed in formally different ways. When we condition on our actions inside EDT, we are computing a conditional probability, whereas in CDT, we are computing a [causal_counterfactual causal counterfactual]. The difference between the two is sometimes explained by contrasting this pair of sentences:

In the first sentence, we imagine being told as news that Oswald didn't shoot Kennedy, and updating our beliefs to integrate this with the rest of our observations. Formally, we take whatever tiny shred of probability we might have assigned to possible worlds where the history books are wrong and Oswald didn't actually shoot Kennedy, and imagine that tiny shred of probability expanding to become the whole of our posterior probability distribution. In particular, if we imagine whatever shred of probability we assign to worlds like that, we know that even in those worlds, Kennedy was still shot.

Let $~$O$~$ denote the proposition that Oswald shot Kennedy, $~$\neg O$~$ denote $~$O$~$ being false, and $~$K$~$ denote the proposition that Kennedy was shot. Our revised probability of Kennedy being shot if $~$O$~$ were actually false, written as $~$\mathbb P(K|\neg O),$~$ would still be quite high.

The second sentence asks us to imagine how a counterfactual world would play out if Oswald had acted differently. To visualize this counterfactual:

This [causal_counterfactual causal counterfactual] is often written as $~$\mathbb P(\neg O \ \square \!\! \rightarrow K).$~$ If you believe that Lee Harvey Oswald acted alone (and did in fact shoot Kennedy), then you should estimate a high probability of $~$\mathbb P(\neg O \ \square \!\! \rightarrow K),$~$ contrasting to your presumably low probability for $~$\mathbb P(K|\neg O).$~$

Computing causal counterfactuals

Many academic discussions of causal decision theory take for granted that we 'just know' a counterfactual distribution $~$\mathbb P(\bullet \ || \ \bullet)$~$ which is treated as heaven-sent. However, one formal way of computing causal counterfactuals was given relative to the theory of [causal_model causal models] developed by Judea Pearl and others.

%%todo: put real diagrams into this section; note that it duplicates a section in Introduction to Logical Decision Theory for Computer Scientists. %%

The backbone of a causal model is a directed acyclic graph showing which events causally affect which other events:

One standard example of such a causal graph is:

This says, e.g.:

A causal model goes beyond the graph by including specific probability functions $~$\mathbb P(X_i | \mathbf{pa}_i)$~$ for how to calculate the probability of each node $~$X_i$~$ taking on the value $~$x_i$~$ given the values $~$\mathbf {pa}_i$~$ of $~$x_i$~$'s immediate ancestors. It is implicitly assumed that the causal model [ factorizes], so that the probability of any value assignment $~$\mathbf x$~$ to the whole graph can be calculated using the product:

$$~$\mathbb P(\mathbf x) = \prod_i \mathbb P(x_i | \mathbf{pa}_i)$~$$

Then the counterfactual conditional $~$\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j))$~$ is calculated via:

$$~$\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j)) = \prod_{i \neq j} \mathbb P(x_i | \mathbf{pa}_i)$~$$

(We assume that $~$\mathbf x$~$ has $~$x_j$~$ equaling the $~$\operatorname{do}$~$-specified value of $~$X_j$~$; otherwise its conditioned probability is defined to be $~$0$~$.)

This just says that when we set $~$\operatorname{do}(X_j=x_j)$~$ we ignore the ordinary parent nodes for $~$X_j$~$ and just say that whatever the values of $~$\mathbf{pa}_j,$~$ the probability of $~$X_j = x_j$~$ is 1.

This formula implies that conditioning on $~$\operatorname{do}(X_j=x_j)$~$ can only affect the probabilities of variables $~$X_k$~$ that are "downstream" of $~$X_j$~$ in the directed graph of the causal model. (Which is why choosing to pet the kitten can't possibly affect whether you have toxoplasmosis.)

Then expected utility should be calculated as:

$$~$\mathbb E[\mathcal U| \operatorname{do}(a_x)] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(o_i | \operatorname{do}(a_x))$~$$

Under this rule, we will calculate that we can't affect the probability of having toxoplasmosis by petting the cat, since our choice to pet the cat is causally downstream of whether we have toxoplasmosis.

[todo: put diagram here]

Proposed technical refinements of CDT

The semantics of the $~$\operatorname{do}()$~$ operation, or causal counterfactuals generally, imply that in Newcomblike problems the first pass of a CDT expected utility calculation may return quantitatively wrong utilities, or even a qualitatively bad option, since the CDT agent will not yet have updated background beliefs based on observing its own decision.

In Newcomb's Problem, the mischievous Omega places two boxes before you, a transparent Box A containing \$1,000, and an opaque Box B. Omega then departs. You can take one box or both boxes. If Omega predicted that you would take only Box B, then Omega has already put \$1,000,000 into Box B. If Omega predicted you would two-box, Box B already contains nothing.

Suppose that in the general population, the base rate of taking only Box B is 2/3. Then at the first moment of making the decision to two-box, a CDT agent will believe that Box B has a 2/3 probability of being full.

Besides this being an inaccurate expectation of future wealth, in a slightly different version of Newcomb's Problem, it leads to potential losses. Suppose you must press one of four buttons $~$W, X, Y, Z$~$ to determine (a) whether to one-box or two-box, and (b) whether to pay an extra \$900 fee to make the money (if any) be tax-free. If your marginal tax rate is otherwise 50%, then the payoff chart in after-tax income might look like this:

$$~$\begin{array}{r|c|c} & \text{One-boxing predicted} & \text{Two-boxing predicted} \\ \hline \text{W: Take both boxes, no fee:} & \$500,500 & \$500 \\ \hline \text{X: Take only Box B, no fee:} & \$500,000 & \$0 \\ \hline \text{Y: Take both boxes, pay fee:} & \$1,000,100 & \$100 \\ \hline \text{Z: Take only Box B, pay fee:} & \$999,100 & -\$900 \end{array}$~$$

A CDT-agent that has not yet updated on observing its own choice, thinking that it has the 2/3 prior chance of Box B being full, will press the button Y.

An obvious amendment is to have CDT observe its first impulse, update its background beliefs if required, recalculate expected utilities, possibly change the option selected, and possibly update again, and continue until arriving at a stable state. This closely resembles the [tickle_defense] in that the CDT agent notices the 'tickle' of an impulse to choose a particular option, and tries updating on that tickle.

A potential problem with this first amendment is that it can potentially go into infinite loops.

In Death in Damascus, a man of Damascus sees Death, and Death looks surprised, then remarks that he has an appointment with the man tomorrow. The man immediately purchases a fast horse and rides to Aleppo, where the next day he is killed by falling roof tiles.

The premise of Death in Damascus is that Death, who like Omega is an excellent predictor of human behavior, has already informed you that whichever choice you end up taking was the one that led to the appointed place of your death. If you decide to stay in Damascus, then observing this, you should expect staying in Damascus to be fatal and Aleppo to be less dangerous. If you observe yourself choosing to ride to Aleppo, you should expect that Aleppo kills you while Damascus would be quite safe. Faced with this dilemma, a causal decision theory that repeatedly updates on the 'tickles' of its observed decision-impulses will go into an infinite loop.

An obvious second amendment is to allow a CDT agent to use mixed strategies, for example to 'choose' to stay in Damascus or go to Aleppo with 0.5 : 0.5 probability. This permits stability in the Death in Damascus case and also some degree of self-observational updating.

However, as Yudkowsky has observed, this twice-amended version of CDT is still subject to predictable losses. At the moment of making the 'mixed' decision to stay in Aleppo or go to Damascus with 0.5 : 0.5 probability, the agent reasons as if it has a 50% chance of surviving (by the semantics of the $~$\operatorname{do}()$~$ operation, the counterfactual for the agent's action cannot, inside that calculation, be correlated with any background variables). So if there was a further-compounded decision which included e.g. a chance to purchase for \$1 a ticket that pays out \$10 if the agent survives, the agent will buy that ticket (and then try to sell it back immediately afterwards). Similarly, once the CDT agent has started on its way to Aleppo (if that was the result of the randomized decision), nothing prohibits it from suddenly realizing that Aleppo is certainly fatal and Damascus is safe, and trying to turn back. In this sense, the stability and internal consistency of CDT agents might still be regarded as an unsolved problem.


technical details: tickles, infinite loops, mixed strategies motivation and history: newcomb's problem, critiques, critiques from logical decision theory