Currently, economics holds as standard the following positions:

- The expected return on your time from voting in elections is very low, unless your vote
*directly causes*many other people to vote. Most elections are not settled by one vote, and your one vote is very unlikely to change that. - In the Ultimatum Game, the experimenter sets up a one-shot, two-player scenario for dividing \$10. One player, the Proposer, proposes how to split the \$10 with the other player, the Responder. If the Responder accepts the proposed split, it goes through. Otherwise both players get nothing. A
*rational*Proposer should offer \$1 to the other player and keep the remaining \$9; a*rational*Responder should accept this bargain. (Assuming that it's a one-time deal, maybe anonymous so that there's no reputation effects, etcetera.) - You and your enemy share a terrible secret. Your enemy sets up an automatic mechanism so that unless your Bitcoin address pays 10 Bitcoins to a certain other address, the secret will be published, greatly harming both you and your enemy. In this case, paying your enemy the 10 Bitcoins has higher expected utility than not paying, and that is what a rational agent will do.

All of these currently-standard replies--about *coordination problems,* *bargaining theory,* and *game theory*--are derived from a decision theory that's dominant in modern analytic philosophy, namely *causal decision theory* (CDT). Roughly, causal decision theory says, "Decide based on the direct physical consequences of your personal, local act." Most economics textbooks may not go into this, but it's where the standard analysis comes from.

Suppose on the other hand there were a decision theory which gave results something like the following:

- In an election, your decision algorithm is likely to be sufficiently similar to at least
*some*other voters that your decisions whether to vote will be logically correlated (running sufficiently similar computational algorithms in different places should yield similar results). If you think you're part of a correlated cohort with similar goals, that might with some probability swing the election, and the value of this possibility is high enough to outweigh your collective costs of voting, you should vote. - You are suddenly introduced to the Ultimatum Game, but only after taking a course on an interesting new decision theory with your fellow experimental subjects. You rather suspect that if you offer \$4 to the other subject and try to keep \$6 for yourself, the other subject will Accept with 83% probability so that your expected personal gain is \$4.98. If you offer \$3 and try to keep \$7, you suspect they'll Accept with 71% probability so that your expected gain is \$4.97. Are they being irrational by sometimes passing up a gain of \$3?
- You are a Distributed Autonomous Organization with known source code. As anyone can see from inspecting your code, you will never give in to blackmail. This is not a special case added to your code, but arises from a simple and general way that you evaluate your decisions. Furthermore, your creators assert that this code is simply
*rational,*rather than it being 'usefully irrational' or 'socially rational' etcetera.

Prescriptions like these can arise from within a new family of decision theories that are collectively termed 'logical decision theories' (in contrast to causal decision theory). The key concept in LDT is that if we're running a decision algorithm $~$\mathcal Q,$~$ a rational $~$\mathcal Q$~$ asks "What logical output of this currently running algorithm $~$\mathcal Q$~$ yields the best outcome?" In some cases, where other people are similar to us or other people are thinking about us, this can yield different answers from causal decision theory's "What physical act yields the best outcome?"

This is not just an informal principle. At least one of the current formalizations of logical decision theory, [modal_agents modal agents] in [proof_based_dt proof-based decision theory], will compile and run as code. Using proof-based decision theory, we can easily simulate computational agents reasoning about other computational agents which are currently reasoning about them, without running into any infinite loops. We can run through this surprisingly easy setup to show that, e.g., it can yield mutual cooperation on the Prisoner's Dilemma between different selfish agents that [common_knowledge know] each others' source code. (See below.)

Logical decision theory is motivated for other reasons as well. Arguendo according to LDT, the current causal decision theory suffers from severe technical problems along the lines of "We can make this decision theory go into infinite loops" or "This decision theory can [dt_negative_voi calculate] a negative [information_value value of information]". Arguendo, fixing these problems leads us into a new, foundational, formal principle of rational choice.

In most of everyday life, the prescriptions of logical decision theory will be extremely similar to those of classical causal decision theory. (The point of a new decision theory is not to produce interestingly weird policies, after all!) In many cases, your knowledge of what algorithms other people are running, is probabilistic at best. Other people also have only probabilistic guesses about what algorithm you are running. In practice, this means that people cannot cooperate on the Prisoner's Dilemma nearly as easily as two computational agents that know each other's source code. Even so, arguendo according to LDT, we should take these probabilistic beliefs quantitatively into account when evaluating the expected utility of our decisions.

In some special cases, logical correlations may make a bigger difference than usual.

Causal decision theory says that if an election over a million people is decided by three votes, then no individual vote had any effect (unless that decision to vote directly caused at least one other people to vote); if the election is decided by one vote, then every one of the winning voters should regard themselves as being responsible for the outcome. Suppose we value the result of the election at a billion dollars. According to CDT, if the election is settled by three votes, all voters should regard themselves as having gained zero utility by voting. If the election is settled by one vote, then half a million people should all regard themselves as having each individually generated a billion dollars of utility.

On the conventional view, this view of elections is just one of those facts economists know that is very surprising to non-economists, like trade between countries being good for both trade partners. Counterintuitive, yes, but so are a lot of things in economics.

LDT, however, suggests that it may make *formal* as well as intuitive sense to reason, "I and 50,000 people like me all voted a certain way for the same reason; this election was swung by 3% of the vote; therefore we should regard ourselves as having collectively generated a billion dollars of value by collectively paying all of our costs of voting."

We can similarly view the concept of 'reputation' in terms of people making public that they act according to a particular algorithm. Arguendo according to LDT, we can view properties like "not giving in to blackmail" or "refusing an offer of \$4 with 17% probability in the Ultimatum Game" in terms of acquiring a reputation for *acting rationally* under a systematic decision theory, rather than people trying to acquire a reputation for *useful irrationality* with the 'useful irrationality' being a variety assortment of special cases.

This introduction to logical decision theory for economists begins by introducing the extremely rough general concept of LDT and why anyone should care (done). The next section then goes on to introduce the roots of the causal decision theory that yielded the old prescriptions, the reason why CDT was originally adopted, and the formal contrast between CDT and logical decision theory. The third section gives a somewhat longer summary of how LDT applies to some standard economic scenarios. The fourth section gives a brief overview of the defense for LDT as the central principle of rational choice and not just some kind of useful irrationality. At the end follows suggested entry points into further online reading.

For a history of who invented what in LDT, see [ldt_history here]. Much of the theory was originally developed over correspondence and workshop meetings between a small group of mostly computer-science researchers, with publications lagging, but for citable papers see [ldt_citations].

%

%%%comment: Logical decision theorists sometimes use the metaphor of computational agents that have definite or probabilistic knowledge about other agents' source code. If you simulated another agent and saw for certain that it would reject any offer less than \$5 in the Ultimatum Game, would you still offer it only \$1?%note: (What about if your simulation showed that the other agent rejected any offer less than \$9?)%knows-requisite([dt_prisonersdilemma]): Logical decision theory can be formalized at least as much as competing decision theories. For example, there is now running code for simulating in a general way how multiple agents with knowledge of each other's code, reasoning about each other's actions, can arrive at an equilibrium. Two similar (but not identical!) agents of this kind can end up cooperating in the Prisoner's Dilemma, assuming they have [common_knowledge common knowledge] of each other's code. Again, you can actually try this out in simulation! %%%In the real world, taking into account iterated interactions and reputational effects and subjective uncertainties, dilemmas are no longer simple as two agents playing the Ultimatum Game with knowledge of each other's source code. Still, the simplified games are often held up as an archetypal example or a base case. Arguendo according to LDT, these base cases are being wrongly analyzed by standard causal decision theory.

%%%

And yes, LDT seems to strongly suggest that, even in the real world, if you're part of a sufficiently large cohort of people all voting the same way for similar reasons, you should (all) vote in the election. %%knows-requisite([dt_prisonersdilemma]):(For much the same reason LDT says you ought to Cooperate, if you're playing the Prisoner's Dilemma against a recent cloned copy of yourself.)%%

%%%%

# Different principles of expected utility

Almost everyone in present and historical debate on decision theory has agreed that 'rational' agents maximize expected utility *conditional on* their decisions. The central question of decision theory turns out to be, "How exactly do we condition on a possible decision?"

## Evidential versus counterfactual conditioning

[todo: condition this text on math2 and write a math1 alternate version.]

Most textbooks show the expected utility formula as:

$$~$\mathbb E[\mathcal U|a_x] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(o_i|a_x)$~$$

where

- $~$\mathbb E[\mathcal U|a_x]$~$ is our average expectation of utility, if action $~$a_x$~$ is chosen;
- $~$\mathcal O$~$ is the set of possible outcomes;
- $~$\mathcal U$~$ is our utility function, mapping outcomes onto real numbers;
- $~$\mathbb P(o_i|a_x)$~$ is the conditional probability of outcome $~$o_i$~$ if $~$a_x$~$ is chosen.

Technically speaking, this formula is almost universally agreed to be wrong.

The problem is the use of standard evidential conditioning in $~$\mathbb P(o_i|a_x).$~$ On this formula we are behaving as if we're asking, "What would be my revised probability for $~$\mathbb P(o_i),$~$ if I was *told the news* or *observed the evidence* that my action had been $~$a_x$~$?"

Causal decision theory says we should instead use the *counterfactual conditional* $~$\ \mathbb P(a_x \ \square \! \! \rightarrow o_i).$~$

The difference between evidential and counterfactual conditioning is standardly contrasted by these two sentences:

- If Lee Harvey Oswald didn't shoot John F. Kennedy, somebody else did.
- If Lee Harvey Oswald hadn't shot John F. Kennedy, somebody else would have.

In the first sentence, we're being told as news that Oswald didn't shoot Kennedy, and updating our beliefs accordingly to match the world we already saw. In the second world, we're imagining how a counterfactual world would have played out if Oswald had acted differently.

If $~$K$~$ denotes the proposition that somebody else shot Kennedy and $~$O$~$ denotes the proposition that Oswald shot him, then the first sentence and second sentence are respectively talking about:

- $~$\mathbb P(K| \neg O)$~$
- $~$\mathbb P(\neg O \ \square \!\! \rightarrow K)$~$

Calculating expected utility using evidential conditioning is widely agreed to lead to an irrational policy of 'managing the news'. For example, suppose that toxoplasmosis, a parasitic infection carried by cats, can cause toxoplasmosis-infected humans to become fonder of cats.%note: "Toxoplasmosis makes humans like cats" was formerly thought to actually be true. More recently, this result may have failed to replicate, alas.%

You are now faced with a cute cat that has been checked by a veterinarian who says this cat definitely does *not* have toxoplasmosis. If you decide to pet the cat, an impartial observer watching you will conclude that you are 10% more likely to have toxoplasmosis, which can be a fairly detrimental infection. If you don't pet the cat, you'll miss out on the hedonic enjoyment of petting it. Do you pet the cat?

Most decision theorists agree that in this case you should pet the cat. Either you already have toxoplasmosis or you don't. Petting the cat can't *cause* you to acquire toxoplasmosis. You'd just be missing out on the pleasant sensation of cat-petting.

Afterwards you may update your beliefs based on observing your own decision, and realize that you had toxoplasmosis all along. But when you're considering the consequences of actions, you should reason that *if counterfactually* you had not pet the cat, you *still* would have had toxoplasmosis *and* missed out on petting the cat. (Just like, if Oswald *hadn't* shot Kennedy, nobody else would have.)

The decision theory which claims that we should condition on our actions via the standard conditional probability formula, as if we were being told our choices as news or Bayesian-updating on our actions as observations, is termed [evidential_decision_theory evidential decision theory]. Evidential decision theory answers the central question "How do I condition on my choices?" by replying "Condition on your choices as if observing them as evidence" or "Take the action that you would consider best if you heard it as news."

(For a more severe criticism of evidential decision theory showing how more clever agents can [money_pump pump money] out of evidential decision agents, see the [termite_dilemma Termite Dilemma].)

## Causal decision theory

Causal decision theory, the current academic standard, says that the expected utility formula should be written:

$$~$\mathbb E[\mathcal U|a_x] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(a_x \ \square \!\! \rightarrow o_i)$~$$

This leads into the question of how we compute $~$\mathbb P(a_x \ \square \!\! \rightarrow o_i),$~$ since it's not a standard conditional probability.

In the philosophical literature, it's often assumed that we intuitively know what the counterfactual results must be. (E.g., we're just taking for granted that you somehow know that if Oswald hadn't shot Kennedy, nobody else would have; this is intuitively obvious.) This is formalized by having a conditional distribution $~$\mathbb P(\bullet \ || \ \bullet)$~$ which is treated as heaven-sent and includes the results of all counterfactual conditionals.

People working in Artificial Intelligence will probably find this unsatisfactory, and will want to refer to the theory of [causal_model causal models] developed by Judea Pearl et. al. The theory of causal models formally states how to perform counterfactual surgery on graphical models of causal processes.

[todo: condition the following text on math2, write weaker-math version]

Formally, we have a directed acyclic graph such as:

[todo: put real diagram here]

- $~$X_1$~$ -> {$~$X_2$~$, $~$X_3$~$} -> $~$X_4$~$ -> $~$X_5$~$

One of Judea Pearl's examples of such a causal graph is:

[todo: real diagram here]

- SEASON -> {RAINING, SPRINKLER} -> {SIDEWALK} -> {SLIPPERY}

This says, e.g.:

- That the current SEASON affects the probability that it's RAINING, and separately affects the probability of the SPRINKLER turning on. (But RAINING and SPRINKLER don't affect each other; if we know the current SEASON, we don't need to know whether it's RAINING to figure out the probability the SPRINKLER is on.)
- RAINING and SPRINKLER can both cause the SIDEWALK to become wet. (So if we did observe that the sidewalk was wet, then even already knowing the SEASON, we would estimate a different probability that it was RAINING depending on whether the SPRINKLER was on. The SPRINKLER being on would 'explain away' the SIDEWALK's observed wetness without any need to postulate RAIN.)
- Whether the SIDEWALK is wet is the sole determining factor for whether the SIDEWALK is SLIPPERY. (So that if we
*know*whether the SIDEWALK is wet, we learn nothing more about the probability that the path is SLIPPERY by being told that the SEASON is summer. But if we didn't already know whether the SIDEWALK was wet, whether the SEASON was summer or fall might be very relevant for guessing whether the path was SLIPPERY!)

A causal model goes beyond the graph by including specific probability functions $~$\mathbb P(X_i | \mathbf{pa}_i)$~$ for how to calculate the probability of each node $~$X_i$~$ taking on the value $~$x_i$~$ given the values $~$\mathbf {pa}_i$~$ of $~$x_i$~$'s immediate ancestors. It is implicitly assumed that the causal model [ factorizes], so that the probability of any value assignment $~$\mathbf x$~$ to the whole graph can be calculated using the product:

$$~$\mathbb P(\mathbf x) = \prod_i \mathbb P(x_i | \mathbf{pa}_i)$~$$

Then, rather straightforwardly, the counterfactual conditional $~$\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j))$~$ is calculated via:

$$~$\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j)) = \prod_{i \neq j} \mathbb P(x_i | \mathbf{pa}_i)$~$$

(We assume that $~$\mathbf x$~$ has $~$x_j$~$ equaling the $~$\operatorname{do}$~$-specified value of $~$X_j$~$; otherwise its conditioned probability is defined to be $~$0$~$.)

This formula implies - as one might intuitively expect - that conditioning on $~$\operatorname{do}(X_j=x_j)$~$ can only affect the probabilities of variables $~$X_i$~$ that are "downstream" of $~$X_j$~$ in the directed acyclic graph that is the backbone of the causal model. In much the same way that (ordinarily) we think our choices today affect how much money we have tomorrow, but not how much money we had yesterday.

Then expected utility should be calculated as:

$$~$\mathbb E[\mathcal U| \operatorname{do}(a_x)] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(o_i | \operatorname{do}(a_x))$~$$

Under this rule, we won't calculate that we can affect the probability of having toxoplasmosis by petting the cat, since our choice to pet the cat is causally downstream of whether we have toxoplasmosis.

[todo: put diagram here]

%%comment: One class of problems is hinted-at by how the above expected utility formula, in the course of making its calculation, gives the wrong expected utility for the action actually implemented! Suppose the [1rm prior probability] of having toxoplasmosis is 10%, and the posterior probability after being seen to pet the cat is 20%. Suppose that *not* having toxoplasmosis has \$100 utility; that having toxoplasmosis has \$0 utility; and that, given the amount you enjoy petting cats, petting the cat adds \$1 of utility to your outcome. Then the above formula for deciding whether to pet the cat suggests that petting leads to an expected utility of \$91, and not petting leads to an expected utility of \$90. This tells us to pet the cat, which is the correct decision, but it also tells us to expect \$91 of expected utility after petting the cat, where we actually receive \$81 in expectation. It seems like the intuitively "correct" answer is that we should calculate \$81 of utility for petting the cat and \$80 utility for not petting it. You might initially be tempted to solve this problem by doing the calculation in phases: - Phase 1: Calculate the decision based on prior beliefs. - Phase 2: Update our beliefs based on having observed our first-order decision. - Phase 3: Recalculate the expected utilities based on the posterior beliefs, possibly picking a new action. ...and then wait for this algorithm to settle into a consistent state. But besides lacking the computational efficiency of computing our decision in one swoop, it's entirely possible for an agent like this to [5qn go into an infinite loop]. %%

## Newcomblike problems

Although causal decision theory became widely accepted, there were also widespread suspicions that causal decision theory might not be optimal, or might be missing some key element of rationality.

The academic debate on this subject revolved mainly around *Newcomblike problems,* a broad class of dilemmas which turned out to include the Prisoner's Dilemma; commons problems and coordination problems (like voting in elections); blackmail and other dilemmas of negotiation; plus other problems of interest.

Roughly, we could describe Newcomblike problems as those where somebody similar to you, or trying to predict you, exists in the environment. In this case your decision can *correlate* with events outside you, without your action *physically causing* those events.

The original Newcomb's Problem was rather artificial, but is worth recounting for historical reasons:

An alien named Omega presents you with two boxes, a transparent box A containing \$1,000, and an opaque Box B. Omega then flies away, leaving you with the choice of whether to take only Box B ('one-box') or to take both Box A plus Box B ('two-box'). Omega has put $1,000,000 in Box B if and only if Omega predicted that you would take only one box; otherwise Box B is empty.

Omega has already departed, so Box B is already empty or already full.

Omega is an excellent predictor of human behavior and has never been observed to be mistaken. E.g., we can suppose Omega has run this experiment 73 times previously and predicted correctly each time.

Do you take both boxes, or only Box B?

- Argument 1: People who take only Box B tend to walk away rich. People who two-box tend to walk away poor. It is better to be rich than poor.
- Argument 2: Omega has already made its prediction. Box B is already empty or already full. It would be irrational to leave behind Box A for no reason. It's true that Omega has chosen to reward people with irrational dispositions in this setup, but Box B is now
*already empty*, and irrationally leaving Box A behind would just counterfactually result in your getting \$0 instead of \$1,000.

This setup went on to generate an incredible amount of debate. Newcomb's Problem is conventionally seen as an example that splits the verdict of evidential decision theory ("Taking only Box B is good news! Do that.") versus causal decision theory ("Taking both boxes does not *cause* Box B to be empty, it just adds \$1,000 to the reward") in a way that initially seems more favorable to evidential decision agents (who walk away rich).

The setup in Newcomb's Problem may seem contrived, supporting the charge that Omega is but rewarding people born with irrational dispositions. But consider the following variant, Parfit's Hitchhiker:

You are lost in the desert, your water bottle almost exhausted, when somebody drives up in a lorry. This driver is (a) entirely selfish, and (b) very good at detecting lies. (Maybe the driver went through Paul Ekman's training for reading facial microexpressions.)

The driver says that they will drive you into town, but only if you promise to give them \$1,000 on arrival.

If you value your life at \$1,000,000 (pay \$1,000 to avoid 0.1% risks of death) then this problem is nearly isomorphic to Gary Drescher's *transparent Newcomb's Problem,* in which Box B is transparent, and Omega has put a visible \$1,000,000 into Box B iff Omega predicts that you one-box when seeing a full Box B. This makes Parfit's Hitchhiker a *Newcomblike problem,* but one in which, one observes, the driver's behavior seems economically sensible, and not at all contrived as in the case of Newcomb's Omega.

Parfit's Hitchhiker also bears a strong resemblance to some real-life dilemmas of central banks, e.g., threatening not to bail out too-big-to-fail institutions. The central bank would benefit *today* from people believing that "the central bank will not bail out irresponsible institutions". But the markets are extremely good at predicting future events, and markets know that later, the central bank's directors will be faced with a scenario where it seems overwhelmingly more convenient to bail out the big bank, and the benefits of people's earlier beliefs are now firmly in the past.

Similarly, on reaching the city in Parfit's Hitchhiker, you might be tempted to reason that the car has already driven you there, and so, when you *now* make the decision in your selfishness, you will reason that you are better off by \$1,000 *now* if you refuse to pay, since your decision can't alter the past. Likewise in the transparent Newcomb's Problem; Box B already seems visibly full, so the money is right there and it can't vanish if you take both boxes, right? But if you are a sort of agent that reasons like this, Box B is already empty. Parfit's driver asks you a few hard questions and then drives off to let you die in the desert. The markets, which can behave like extremely good predictors, may go ahead and call your bluff about moral hazard.

Both causal decision agents and evidential decision agents will two-box on the transparent version of Newcomb's Problem, or be left to die in the desert on Parfit's Hitchhiker. A causal agent who sees a full Box B reasons "I cannot cause Box B to become empty by leaving behind Box A. Even an evidential agent reasons, "It wouldn't be good news about anything in particular to leave behind Box A; I already *know* Box B is full."

%%comment: Of course these simple scenarios are not exactly representative of what happens in the real world with the central bank and interest rates. In the real world there are reputational effects and iterated games; if you bail out a bank today, this has a penalty in the form of people expecting you to bail out more banks later (albeit by that time you may have retired and left somebody else to run the central bank). But this doesn't rule out Newcomblike channels as a *component* of the problem; people may have some idea of what kind of algorithm you're running and try to mentally simulate what your algorithm does. There's also no rule saying that efficient markets *can't* read your expression well enough for what you secretly expect you'll do later %note: 'What you expect you'll do later' is produced by your current self's mental simulation of your future self, which is why the output of your future self's algorithm is appearing at two separate points in the overall process.% to have some effect on what people think of you now. %note: From the perspective of a causal decision agent, we should just regard our facial expressions as causal actions now that affect what other people believe, and control our facial expressions, darn it! Of course, a logical decision theorist replies that if you could in fact control your facial expressions, and that market actors could mentally simulate you well enough to predict that you would try to control your facial expressions, the market actors would not already regard those facial expressions as evidence.% In this sense, it may matter a non-zero amount whether people think that the *rational* course of action in the simplified Parfit's Hitchhiker dilemma is to pay \$1,000 even after already reaching town, or if it is *rational* to die in the desert. %%

## Logical decision theory

A logical decision agent cheerfully promises to pay the \$1,000 to Parfit's driver, and then actually does so. They also cheerfully leave behind Box A in the transparent Newcomb's Problem. An LDT agent is choosing the best output for their algorithm, and reasoning, "If my algorithm had output 'don't pay' / 'take both boxes', then this would have implied my dying in the desert / Box B being empty."

More generally, a logical agent ought to calculate expected utility as follows:

$$~$\mathsf Q(s) = \big ( \underset{\pi_x \in \Pi}{argmax} \ \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(\ulcorner \mathsf Q = \pi_x \urcorner \triangleright o_i) \big ) (s)$~$$

Where:

- $~$\mathsf Q$~$ is the agent's current decision algorithm - that is, the whole calculation presently running.
- $~$s$~$ is the agent's sense data.
- $~$\pi_x \in \Pi$~$ is the output of $~$\mathsf Q$~$, a
*policy*that maps sense data to actions. - $~$\ulcorner \mathsf Q = \pi_x \urcorner$~$ is the proposition that, as a logical fact, the output of algorithm $~$\mathsf Q$~$ is $~$\pi_x.$~$
- $~$\mathbb P(X \triangleright o_i)$~$ is the probability of $~$o_i$~$
*conditioned*on the logical fact $~$X.$~$

Technically, with regards to the logical conditioning operator $~$X \triangleright Y$~$, logical decision theorists know two special-case ways to try to calculate it:

- [proof_based_dt As a premise introduced into a standard proof algorithm], working out further logical implications.
- As a Pearl-style $~$\operatorname {do}()$~$ on a standard causal model that [timeless_dt includes some nodes intended to denote unknown logical propositions.]

Logical decision theorists are still trying to figure out a good candidate for a *general* formalization of $~$X \triangleright Y,$~$ but meanwhile:

- "Treating $~$X$~$ as a new premise in a proof system and deriving its implications" works to formalize [modal_agents], and let us formally simulate agents with common knowledge of each other's code negotiating on the Prisoner's Dilemma and other game-theoretic setups.
- "$~$\operatorname{do}()$~$ on causal models that include some logical propositions" suffices to formalize all the dilemmas, thought experiments, and economic scenarios (at least as far as causal decision theory could formalize them).

We know these two formalization styles aren't complete because:

- The proof formalism we use for modal agents has [ weird edge cases] indicating that it only correctly formalizes the intuitive notion of logical conditioning some of the time.
- We don't have a general algorithm for
*building*causal models that include logical facts, and it's not clear that representation can model any setup more complicated than "run this exact algorithm in two different places".

# Some LDT behaviors in economically suggestive scenarios

(For the question of whether all of these are *rational* behaviors, or merely *usefully irrational* behaviors that happen to make agents richer, see the final section "LDT as the principle of rational choice" below.)

## Voting

LDT agents vote in elections if they believe they are part of a sufficiently large cohort of people voting for similar reasons that their cohort has a non-negligible probability of swinging the election (such that the expected value of possibly swinging this election outweighs the cost of everyone in their cohort voting).

Voting is one of the most plausible candidates for something like a pure LDT analysis going through. With so many voters participating in elections, there may plausibly be a large number of agents out there whom you should consider as being logically correlated with you.

When Douglas Hofstadter was proposing 'superrationality' (rationality taking into account that different 'rational' reasoners in similar situations will probably arrive at the same 'rational' answer), he illustrated the point by sending a letter to twenty friends inviting them to take part in a multi-player version of the Prisoner's Dilemma. Each player, if they Cooperated, would receive \$3 from each other Cooperator and \$0 from each Defector; each Defecting player would receive \$5 from each Cooperating player and \$1 from each Defector. So for example, the total payout would be \$19 each if everyone Defected, and \$57 if everyone Cooperated. The contest was sponsored by *Scientific American,* and Hofstadter requested all players to play as though they were entirely selfish and without communicating with each other. Hofstadter also included a paragraph saying:

Of course, your hope is to be the unique defector, thus really cleaning up: with 19 C-ers, you’ll get \$95 and they’ll each get 18 times \$3, namely \$54. But why am I doing the multiplication or any of this figuring for you? You’re very bright. So are all of you! All about equally bright, I’d say, in fact.

Hofstadter recounts the first 7 replies, which ran 5 Ds to 2 Cs, and then says:

So far, I’ve mentioned five D’s and two C’s. Suppose you had been me, and you’d gotten roughly a third of the calls, and they were 5-2 in favor of defection. Would you dare to extrapolate these statistics to roughly 14-6? How in the world can seven individuals’ choices have anything to do with thirteen other individuals’ choices? As Sidney Nagel said, certainly one choice can’t influence another (unless you believe in some kind of telepathic transmission, a possibility we shall discount here). So what justification might there be for extrapolating these results?

Clearly, any such justification would rely on the idea that people are “like” each other in some sense. It would rely on the idea that in complex and tricky decisions like this, people will resort to a cluster of reasons, images, prejudices, and vague notions, some of which will tend to push them one way, others the other way, but whose overall impact will be to push a certain percentage of people toward one alternative, and another percentage of people toward the other. In advance, you can’t hope to predict what those percentages will be, but given a sample of people in the situation, you can hope that their decisions will be “typical”. Thus the notion that early returns running 5-2 in favor of defection can be extrapolated to a final result of 14-6 (or so) would be based on assuming that the seven people are acting “typically” for people confronted with these conflicting mental pressures.

In the end, 14 respondents cooperated and 6 defected, [law_of_small_numbers exactly as would be expected from extrapolating the first 7 responses]. If we imagine that it was entirely fair to extrapolate the results this precisely, then the *average* respondent among the first 7 would need to imagine roughly 2 of the next 13 respondents being correlated with them. Your move being correlated with 2 others would not, in this particular setup, be enough to swing the balance toward Cooperating instead of Defecting. Even moving in unison with 4 others wouldn't be enough to swing the result. %note: If the starting fraction is 11D-4C, then 5 more Cs move the payoff for the Cs to \$24 apiece. 5 more Ds sends the payoff for the Ds to \$36 apiece.%

But Hofstadter's clever choice of a *multi-player* Prisoner's Dilemma for his experiment, illustrates the general idea that when we deal with larger populations, people might not be *pairwise* logically correlated but the *average* person probably belongs to a large cohort. That is, the average person will have logical doppelgangers constituting a significant fraction of the population. This in turn seems like a reasonable idea to apply to voting in elections.

## Oneshot Prisoner's Dilemma

%%!knows-requisite([dt_prisonersdilemma]): [todo: introduce PD here] %%

Much of the early work in LDT revolved around computational agents playing the Prisoner's Dilemma with [common_knowledge common knowledge] of each other's source code.

Suppose, for instance, that you are a computational agent $~$\mathsf {A}$~$ playing the Prisoner's Dilemma and you are told that the code of the other agent $~$\mathsf {Fairbot}$~$ is as follows:

```
def Fairbot(A):
if is-provable("A(Fairbot) == Cooperate"):
return Cooperate
else:
return Defect
```

In other words, $~$\mathsf {Fairbot}$~$ tries to prove that $~$\mathsf {A}$~$ (you) cooperates with it. If $~$\mathsf {Fairbot}$~$ proves that $~$\mathsf A,$~$ playing with $~$\mathsf {Fairbot},$~$ cooperates, then $~$\mathsf {Fairbot}$~$ cooperates; otherwise $~$\mathsf {Fairbot}$~$ defects.

Would you as $~$\mathsf A$~$, in this case, defect on the Prisoner's Dilemma? We can suppose even that $~$\mathsf {Fairbot}$~$ has already run and has already made its move one way or another, so that your move cannot possibly have any causal effect on $~$\mathsf {Fairbot}.$~$ With some slight edits to the setup, we can even suppose you are told $\mathsf{Fairbot}'s move before making your own. The prescription of CDT and EDT alike in this case is thus to Defect, which an LDT agent reasoning informally would regard as unwise.

Fairbot does not play optimally on the PD. $~$\mathsf{Fairbot}$~$ does cooperate with another $~$\mathsf {Fairbot}$~$, even if the code of the other agent is not exactly similar (e.g. they can use different proof systems). %note: (If you want to know why Agent A trying to prove things about Agent B who is simultaneously trying to prove things Agent A (trying to prove things about agent B…) doesn't just collapse into an infinite recursion, the answer is "[modal_agents Because] of Löb's theorem." Sorry, this one takes a detailed explanation.% If $~$\mathsf {Fairbot}$~$ is reasoning in a [logic_soundness sound] system such as first-order arithmetic, then $~$\mathsf {Fairbot}$~$ is also inexploitable; it never Cooperates when the opponent Defects. However, $~$\mathsf {Fairbot}$~$ cooperates with $~$\mathsf {CooperateBot},$~$ the agent which simply always returns 'Cooperate'. By the premises of the Prisoner's Dilemma, we ought to at least bother to Defect against a rock with the word "Cooperate" written on it. $~$\mathsf {Fairbot}$~$ fails to exploit $~$\mathsf {CooperateBot},$~$ so $~$\mathsf {Fairbot}$~$'s play is not optimal.

The milestone LDT paper "Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium Through Provability Logic" exhibited a proof (and running code!) that there existed a simple agent $~$\mathsf{PrudentBot}$~$ which:

- Mutually cooperated with $~$\mathsf{Fairbot}$~$
- Mutually cooperated with another $~$\mathsf{PrudentBot}$~$
- Defected against $~$\mathsf{DefectBot}$~$
- Defected against (hence exploited) $~$\mathsf{CooperateBot}$~$
- Was itself provably unexploitable.

The formula for $~$\mathsf{PrudentBot}$~$ was later generalized to more natural and generic agents.

Real-life versions of the Prisoner's Dilemma do not match the setup of two computational agents with common knowledge of each other's source code. In real life, there are reputation effects, people who care about other people, and a far more tenuous grasp on the other person's reasoning processes. The Prisoner's Dilemma with probabilistic beliefs has yet to be formally analyzed in LDT.

Nonetheless. it seems fair to say that if you are an LDT agent:

- You should not Defect against somebody who you are pretty sure will reason very similarly to you.
- You should not Defect against a fair-minded agent that you think is pretty good at predicting you.
- You should not Defect against an agent that is pretty good at predicting you, that you are pretty good at predicting, who you predict has decided to Cooperate iff it predicts you Cooperate (in order to incentivize you to do likewise, given your own predictive abilities).

As the relative benefits of Cooperation in a PD-like scenario increase, it seems increasingly plausible that some LDT-like reason for cooperation will end up going through. E.g. if the payoff matrix is (1,1) vs (0, 101) vs (101, 0) vs (100, 100), even a small probability that the two of you are reasoning similarly, or that one of you is a fair-minded good predictor, etcetera, might be enough to carry the handshake.

If you further accept the argument below that LDT is a better candidate than CDT for the principle of rational choice, then it is also fair to say that economists should stop going around proclaiming that rational agents defect in the Prisoner's Dilemma. LDT is not a polyanna solution that makes Cooperation right for almost everyone almost all the time. Two LDT agents might end up Defecting against each other, e.g. because they don't know each other to be LDT agents, or don't know the other knows, or don't have a quantitatively great enough trust in their ability to predict the other accurately, etcetera. But "rational agents in general just can't do anything about the equilibrium where they defect against each other in the oneshot Prisoner's Dilemma" is much too harsh; e.g. it seems readily solvable if two rational agents have common knowledge that they are both rational.

%%knows-requisite([dt_gametheory]):

## Game theory

The first-ever principled derivation of Nash-equilibrium play in two agents doing pure expected utility maximization and modeling each other, *without* each agent starting from the assumption that the other agent is already looking for Nash equilibria, was done by logical decision theorists using a common-knowledge-of-code setup and reflective oracles to ensure mutual probabilistic simulatability.
%%

## Ultimatum Game

[todo: reintroduce Ultimatum Game]

Formally, LDT has not yet proven a unique solution to the Ultimatum Game. But once we go past the general paradigm of CDT, an LDT Responder need not accept an offer of \$1.

This is true even if the Proposer is a CDT agent, so long as the Proposer has some idea that the agent on the other end is or *might possibly* be an LDT agent with sufficient probability. From the perspective of a CDT agent, an LDT agent is just a weird sort of environmental object which 'irrationally' rejects \$1 offers, so if the CDT agent knows this, it will try to appease the LDT agent with higher offers.

In fact, if the LDT agent knows the CDT agent knows for certain that it is dealing with an LDT agent, the LDT agent will reject any offer except \$9 - very predictably so, from the CDT agent's standpoint, and so it is only a \$1-\$9 split that the CDT agent finds it profitable to propose. Oh, what can you do with an agent that's so irrational (except give it almost all of the gains from trade)?

The case of two LDT agents has no formal solution that has been proven from scratch. Informally, Eliezer Yudkowsky has suggested (in unpublished discussion) that an LDT equilibrium on the Ultimatum Game might appear as follows:

Suppose Agent A thinks a *fair* solution (where 'fair' is a term of art to be considered in more detail soon) is \$5 apiece for Proposer and Responder (e.g. because \$5 is the Shapley value).

However, Agent A is not certain that Agent B considers the Shapley value to be a 'fair' division of the gains.

Then:

- Agent A doesn't want to blanket-reject all offers below \$5, because in this case two agents with an even slightly different estimate of what is 'fair' (\$5 vs. \$4.99) will receive nothing.
- Agent A doesn't want to blanket-accept any offers below \$5, because then another LDT agent who predicts this will automatically offer Agent A an amount below \$5.

This suggests that Agent A should definitely accept any offer at or above what Agent A thinks is the 'fair' amount of \$5, and *probabilistically* accept or reject any offer below \$5 such that the value to the other agent slopes downward as that agent proposes lower splits. E.g., Agent A might accept an offer $~$q$~$ beneath \$5 with probability $~$p$~$:

$$~$p = \big ( \dfrac{\$5}{\$10 - q} \big ) ^ {1.01}$~$$

So, for example, an offer of \$4 would be accepted with 83% probability, for an expected gain to the Proposer of \$4.98, and an expected gain to the Responder of \$3.32. From the Responder's perspective, the important feature of this response function is not that the Responder gets the same amount as the Proposer, but that the Proposer experiences no incentive to give the Responder less than the 'fair' amount (and further-diminished returns as the Responder's returns decrease further).

This gives the notion of 'fair', while so far seeming arbitrary in terms of fundamentals (that is, we can't yet derive a 'fair' function starting from scratch, and this may well be impossible), a very Schelling Point-like status:

- You cannot do better by estimating that your 'fair' share is higher than what others think is your fair share.
- You cannot do better by estimating that your 'fair' share is lower than what others think is your fair share.
- Other agents have no incentive to estimate that your 'fair' share is less than what you think is your fair share.

Note that this solution is not on the Pareto boundary. In fact, it was partially inspired by an earlier proof by Stuart Armstrong that it was impossible to develop a bargaining solution with certain properties that did lie on the Pareto boundary.

We are not professional economists, but this strikes us as a potentially important observation to economics. The Ultimatum Game stands in for the problem of *dividing gains from trade* in non-liquid markets. If I am the only person in town selling a used car, whose use-value to me is \$5000; and you are the only person in town trying to buy a used car, whose use-value to you is \$8000; then in principle the trade could take place at any price between \$5001 and \$7999. In the former case, I have only gained \$1 from the trade and you have captured \$2999 of the value; if I am the sort of person who refuses a \$1 offer in the Ultimatum Game, I may refuse your offer of \$5300 as well; in which case both of us are worse off to some degree. This incentivises you to offer more than \$5300, etcetera. So far as we know, the suggestion that there can be a continuous stable equilibrium around a 'fair' Schelling-Point price, in a one-shot market, is original to LDT-inspired reasoning.

## Iterated Prisoner's Dilemma with unknown horizon

On the standard (CDT) analysis, the iterated PD with a known number N of iterations just reduces to N mutual defections, by induction. Two CDT agents both defect on the last round unconditionally on previous behavior, both agents know this, therefore both agents defect on the second-to-last round, etcetera.

As observed by somebody, [todo: somebody look up this citation] the iterated PD with *unknown* horizon is really an Ultimatum Game! If one agent can 'precommit' to a visible strategy, or announce a strategy and stick to it, or just have their strategy deducible from their playing pattern, then they can try to take almost-all the gains from cooperating and leave the other agent with a gain barely above that of mutual defection. The other agent then has the choice of 'accepting' this by playing along, or of 'rejecting' by just defecting.

For example, I am an LDT agent, you are a CDT agent, and these two facts are common knowledge between us. The payoff matrix from my perspective is (3, 0) > (2, 2) > (1, 1) > (0, 3). Predictably to you, I reason to the following policy:

"If you cooperate on every round, I will cooperate on 2 out of 3 rounds. If I see you defect even once, I will always defect thereafter."

If you *accept* this division of the gains, you get \$2 + \$2 + \$0 = \$4 utility per 3 rounds. If you *reject* it, you get \$1 + \$1 + \$1 = \$3 utility per 3 rounds. Thus a CDT agent accepts the proposed division, and I get \$7 utility per 3 rounds.

Suppose conversely that we are both LDT agents. For some reason you honestly think it is 'fair' for you to get \$7 per 3 rounds. I think it is 'fair' for you to get \$6 per 3 rounds. Then I might (predictably) accept your proposal with e.g. 74% probability and otherwise always defect. Your expected gain from this proposal is \$7 * 74% + \$3 * 26% = \$5.96 per 3 rounds, with my own gain falling to \$3.74.

## Blackmail

In the Nuclear Prisoner's Dilemma, both players have a third option, Nuke. If either player presses Nuke, both players get -\$100.

Suppose you are a CDT agent moving first in the Nuclear Prisoner's Dilemma (Player 1). I am an LDT agent who will causally learn your move - that is, the button you press *causes* me to learn what move you have made - and then I move second.

Naturally, you will find when simulating my reasoning that Player 2's algorithm predictably selects a policy of "Defect if Player 1 Cooperates, otherwise Nuke." Well, you'd better Cooperate since Player 2 is being so irrational!

Besides being another case of LDT agents 'irrationally' taking away all of a CDT agent's lunch money, this also highlights an open problem in logical decision theory, namely demonstrating a *no-blackmail equilibrium*.

The LDT agent, Player 2, arrived at its policy via knowing that Player 1 was simulating it, so that Player 2's iteration over possible sense-act mappings (policies) hit upon the fact that the map {C->D, D->N, N->N} produced a C output from Player 1.

Now suppose I am an even more clever sort of agent, and Player 2 finds when simulating me that if it chooses any *policy* other than unconditional cooperation, my output seems to be Nuke! I, of course, have done some equivalent of peering cleverly at Player 2's abstract workings, and then handing my output over to a function which simulates Player 2's policy choice and outputs Nuke if Player 2's simulated policy choice is anything other than unconditional cooperation.

From the standpoint of the general family of LDT decision theories, it seems clear that Player 2 should refuse to be blackmailed (or do so with sufficient probability that the Clever Player has no expected gain from pulling this shenanigan). But the LDT formula *as currently written* will iterate through formulas, find that unconditional cooperation seems to be the best policy, and output unconditional surrender.

In this sense, the LDT expected utility formula as written in the previous section, seems to not fully capture the principle of prudent reasoning. LDT as currently written will resist blackmail by simple agents who take a simulated 'No' for an answer, and will seize on the opportunity to blackmail CDT agents and other naive souls who give a simulated 'Yes'. But as written, a still more clever agent can blackmail the *formal* versions of LDT agents that have been written down so far.

Going by the general spirit of LDT, it seems like the final result *ought* to be some sort of [no_blackmail no-blackmail equilibrium] where a prudent agent engages in a logical handshake on positive-sum trades, and ignores mutually-destructive extortion threats (or opts for mutual destruction with sufficient probability to eliminate all expected gains to the extorter, beyond the extorter's 'fair' gains).

But deriving this no-blackmail equilibrium (in a general way, within a system otherwise appealing) remains an open problem in the field.

# LDT as the principle of rational choice

The general family of logical decision theories claims to embody *the principle of rational decision*--or at least, embody it a lot better than causal decision theory and better than any currently known alternative. Even if there are open problems like a good general formulation of "logical counterfactuals" / "logical conditioning" and proving a no-blackmail equilibrium, the claim is that *some sort of LDT* being the principle of rational decision should now appear far more reasonable than that CDT is the principle of rational decision.

## Argument from fairness of Newcomblike problems

The first and foremost motivation behind LDT is that LDT agents systematically end up rich. The current literature contains rich veins of discourse about "one-boxers" on Newcomb's Problem asking two-boxers "Why aincha rich, if you're so rational?" and various retorts along the lines of "It's not my fault Omega decided to punish people who'd act rationally before I even got here; it doesn't change what my rational choice is now."

That retort may sound less persuasive if we're thinking about Parfit's Hitchhiker instead of Newcomb's Problem. The driver is not making a weird arbitrary choice to punish 'rational' agents, the driver is just acting with undisputed economic rationality on their own part. It makes no (selfish) sense to rescue someone you don't predict will pay up afterwards. Similarly with the cases above where LDT agents are carefully extracting all of a CDT agent's lunch money after simulating the CDT agent's acquiescence (an easily predictable acquiescence, if the CDT agent is known to be CDT-'rational'); these LDT agents hardly seem deranged or arbitrary in an intuitive sense.

If in Newcomb's Problem Omega read the agent's source code and decided to reward only agents with an *algorithm* that output 'one box' *by picking the first choice in alphabetical order,* punishing all agents that behaved in exactly the same way due a different internal computation, then this would indeed be a rigged contest. But in Newcomb's Problem, Omega only cares about the behavior, and not the kind of algorithm that produced it; and an agent can indeed take on whatever kind of behavior it likes; so, according to LDT, there's no point in saying that Omega is being unfair. You can make the logical output of your currently running algorithm be [ldt_freewill whatever you want], so there's no point in picking a logical output that leaves you to die in the desert.

## Argument from freedom of counterfactuals

Within analytic philosophy, the case for causal decision theory rests primarily on the intuition that one-boxing on Newcomb's problem cannot *cause* Box B to be full. Or that on the transparent Newcomb's Problem, with Box B transparently full (or empty), it cannot be reasonable to imagine that by leaving behind Box A and its \$1,000 you can cause things to be different.

Is it not in some sense *true,* after Parfit's driver has conveyed the LDT agent to the city in the desert, that in the counterfactual world where the LDT agent does not at that time choose to pay, they remain in the city? In this sense, must not an LDT agent be deluded about some question of fact, or act as if it is so deluded?

The LDT agent responds:

- There aren't any actual "worlds where I didn't pay" floating out there. The only
*real*world is the one where my decision algorithm had the output it actually had. To imagine other worlds is an act of imagination, computing a description of a certain world that doesn't exist; there's no corresponding world floating out there, so my description of that impossibility can't be true or false under a correspondence theory of truth. "Counterfactuals were made for humanity, not humanity for counterfactuals". That being the case, I can decide to condition on the actions my algorithm doesn't take in whatever way produces the greatest wealth. I am free to say "In the nonexistent world where I don't pay now, I already died in the desert". - I don't one-box in Newcomb's Problem
*because*I think it physically causes Box B to be full. I one-box in Newcomb's Problem because I have computed this as the optimal output in an entirely different way. It begs the question to assume that a rational agent must make its decision by carrying out some particular ritual of cognition about which things physically cause which other things; and that's the premise behind the notion that I am "acting as if" I irrationally believe that my choice physically causes Box B to be full.

To this reply, the LDT agent adds that it *is* desirable for the action-conditionals we compute to match the one real world. That is, if it's a fact that your decision algorithm outputs action $~$a_x,$~$ then your imagination of "The world where I do $~$a_x$~$" inside that term of the expected utility formula should match reality. [ This is a condition that CDT violates!] (In a way that can be used to pump money out of CDT agents.)

## Argument from coherence

LDT agents never need to resort to [ldt_no_precommitment precommitments] (since LDT agents never wish their future selves would act differently from the LDT algorithm). LDT agents always calculate [ltd_positive_voi a positive value of information] (where, on the transparent Newcomb's Problem, an evidential decision agent might beg you to *not* render Box B transparent, since then it will be empty).

From the standpoint of LDT, CDT agents are undergoing [ preference reversals] and being subject to [ money pumps] in a way that the economic literature usually treats as prima facie indicators of economic irrationality.

E.g., a CDT agent will, given the chance to set up before Omega's arrival, pay a \$10 fee to have a precommitment assistant stand around with a gun threatening to shoot them if they take both boxes in Newcomb's Problem. Naturally, given the chance by surprise, the same agent would *later* (after Omega's departure) pay \$10 to make the gun-toter go away. %note: Even believing that Omega has accurately predicted this, and that Box B is therefore empty!%

From the standpoint of an LDT agent, this is no more excusable than an agent exhibiting non-Bayesian behavior on the [allais_paradox Allais Paradox], paying \$1 to throw a switch and then paying another \$1 to throw it back. A rational agent would not exhibit such incoherence and drive itself around in circles.

Of course, in real life, other people may not be certain of what algorithm we are running; so there will remain an in-practice use for publicly visible precommitments. But even from a CDT standpoint it is worth asking questions like "To the extent people *can* figure out what algorithm we're using, what should that algorithm be?" Or "If we could publicly pay someone to force us to use a particular algorithm, what should that algorithm be?" If Distributed Autonomous Organizations ever happen, or smart-but-not-superintelligent machine agents, there will be economic actors that can have publicly visible source code.

Similarly, in real life there are reputational effects that serve as local incentives to behave today the way we want people to expect us to behave later. But it remains worth asking, "Is there a *general* algorithm that I want to have a reputation for using?"

The modern economics literature so takes for granted 'precommitments' and 'usefully irrational' behavior at the bargaining table, that it may sound odd to claim that [ dynamic consistency] and [ reflective consistency] are desirable properties for the principle of rational choice to have. Why, who would think in the first place that rational agents would want their future selves to behave rationally? There are just two entirely different subjects of study, the study of 'What is rational behavior', and the study of 'What rationalists wish they behaved like' or rationality as modified by precommitments, 'useful irrationality', 'social rationality', 'superrationality', etcetera.

But it would nonetheless be a remarkable fact if the study of 'the dispositions rational agents wish they had' pointed strongly in the direction of a simpler general algorithm, with philosophically appealing properties, that happened to be much more self-consistent in various ways, whose corresponding agents stand around saying things like "Why on Earth would a rational agent wistfully wish that their future selves would be irrational?" or "What do you mean *useful irrationality?* If a choice pattern is useful, it's not irrational!" %note: See also the Star Trek episode where Spock talks about losing a chess game to an opponent who played 'illogically'.%

A truly pure causal decision agent, with no other thoughts but CDT, will wave off all that argument with a sigh; you can't alter what Fairbot has already played in the Prisoner's Dilemma and that's that. But if we actual humans let ourselves blank our minds of our previous thoughts and try to return to an intuitive, pretheoretic standpoint, we might suspect from looking over this situation that we have made a mistake about what to adopt as our explicit theory of rationality.

# LDT in economics going forward

If the LDT position is anywhere near valid, then putting the outputs of the CDT algorithm into economics textbooks as the 'rational' choice was just an honest mistake, and academic economics ought to chew over this situation for a year or two and then go back and announce that voting can be rational after all.

Even if we don't accept as certain that 'some sort of theory in the logical decision theory family' is liable to end up looking like the ultimate principle of rational choice after the dust settles, the CDT analyses of scenarios like the Iterated Prisoner's Dilemma and the Ultimatum Game have been cast into *severe* doubt as final statements, and ought not to be repeated further as solid judgments about economic rationality.

Human behavior on the Ultimatum Game suggests that, descriptively, the economic agents of the modern world might behave *more* like LDT agents than CDT agents (although actual humans clearly don't implement either algorithm).

LDT may allow analyzing numerous promise-keeping and honorableness behaviors in terms of a simple ideal, rather than these being treatable only as ad-hoc precommitment patches that complicate the simple CDT ideal.

The systematic study of bargaining equilibria with notions of 'fairness'--which only make sense in light either of LDT, or of someone trying to end up with a reputation for being an LDT-like agent--might end up being relevant to real-life bargaining.

# Further reading

You've now seen an overview of many key aspects of logical decision theories. From here, you can go on to read about:

- How proof-based agents work, and why we can simulate them in polynomial time.
- [ How to derive game theory from expected utility maximization and agents reasoning about each other.]
- [ Open questions in the modal agents formalism.]
- How LDT actually handles Newcomblike problems.
- Updateless decision theory; how to not go into infinite loops.
- Timeless decision theory; how and why to factor logical uncertainty into causal models.
- [ A bestiary of Newcomblike problems and how to represent and reason about them using LDT.] (requires b, c) - Negotiations and 'fairness', in the Ultimatum Game and elsewhere. - The problem of the no-blackmail equilibrium.
- [ The philosophical debate about Newcomblike problems and principles of rationality.]
- Why LDT is more reflectively and dynamically coherent than CDT.
- [ Controlling something does not require changing it - why it makes sense to talk about controlling the output of a logical algorithm.]
- [ldt_history A brief history of LDT and who invented what.] (requires a, b, c, d, f)
- [ Why LDT matters to issues of sufficiently advanced machine intelligence.] (requires b, d, e, f)
- [ldt_citations].

[todo: add further links here, actually start filling out the redlinks]

## Comments

Eric Rogstad

$10

Eric Rogstad

I got lost here (and in the following equations). I think it's a combination of needing the "factorizes" redlink filled in, and not understanding the do() syntax.

Eric Rogstad

Okay, read through this section again, and I think it makes sense to me now. Would love to see an explicit walkthrough of the calculation with actual numbers though.

Eric Rogstad

This use of "naturally" may be jarring, since it may not feel obvious to the reader just being introduced to logical decision theories that this is how they work. (And I think it is common for readers to be annoyed when an author treats something as obvious that's not clear to them.)

Consider, "As you may suspect, logical decision agents…" or "As we shall see, logical decision agents…"

Eric Rogstad

Is simplified Parfit's Hitchhiker the same as what was described above? I'm uncertain because this is the first time on the page that it's been called "simplified."

Eric Rogstad

use colon instead?

Eric Rogstad

comma?

Eric Rogstad

Two L's

Eric Rogstad

Above, first-person pronouns referred to Player 2, but now they seem to refer to Player 1. Was the switch intentional?

Eric Rogstad

due to

Eric Rogstad

person

Eric Rogstad

pick one

Eric Rogstad

I'm not sure I understand this part. Did you get "roughly 2" just by dividing 13 by 7?

Why should the first 7 respondents think of themselves as being part of the first 7 (rather than the first 4, etc)?