Parfit's Hitchhiker

https://arbital.com/p/parfits_hitchhiker

by Eliezer Yudkowsky Aug 5 2016 updated Aug 5 2016

You are dying in the desert. A truck-driver who is very good at reading faces finds you, and offers to drive you into the city if you promise to pay $1,000 on arrival. You are a selfish rationalist.


You are stranded in the desert, running out of water, and soon to die. Someone in a motor vehicle drives up to you. The driver of the motor vehicle is a selfish ideally game-theoretical agent, and what's more, so are you. Furthermore, the driver is Paul Ekman who has spent his whole life studying facial microexpressions and is extremely good at reading people's honesty by looking at their faces.

The driver says, "Well, as an ideal selfish rational agent, I'll convey you into town if it's in my own interest to do so. I don't want to bother dragging you to Small Claims Court if you don't pay up. So I'll just ask you this question: Can you honestly say that you'll give me \$1,000 from an ATM after we reach town?"

On some decision theories, an ideal selfish rational agent will realize that once it reaches town, it will have no further incentive to pay the driver. Thus, agents of this type answer "Yes," whereupon the driver says "You're lying" and drives off leaving them to die.

Would you survive? %note: Okay, fine, you'd just keep your promise because of being honest. But would you still survive even if you were an ideal selfish agent running whatever algorithm you consider to correspond to the ideal [principle_rational_choice principle of rational choice]?%

Analysis

Parfit's Hitchhiker is noteworthy in that, unlike the alien philosopher-troll Omega running strange experiments, Parfit's driver acts for understandable reasons.

The Newcomblike aspect of the problem arises from the way that your algorithm's output, once inside the city, determines both:

We may assume that Parfit's driver also asks you questions like "Have you really thought through what you'll do?" and "Are you trying to think one thing now, knowing that you'll probably think something else in the city?" and watches your facial expression on those answers as well.

Note that quantitative changes in your probability of survival may be worth pursuing, even if you don't think it's certain that Paul Ekman could read off your facial expressions correctly. Indeed, just a driver who is fairly good at reading faces might motivate this as an important Newcomblike problem, if you value significant probability shifts in your survival at more than \$1,000.

Parfit's Hitchhiker is structurally similar to the Transparent Newcomb's Problem, if you value your life at \$1,000,000.

Responses

Causal decision theory

Dies in the desert. A CDT agent knows that its future self will reason, "Now that I'm in the city, nothing I do can physically cause me to be back in the desert again" and will therefore refuse to pay. Therefore, the present agent is unable to answer honestly that it will pay in the future.

Evidential decision theory

Dies in the desert. An EDT agent knows that its future self will reason, "Since I can already see that I'm in the city, my paying \$1,000 wouldn't provide me with any further good news about my being in the city."

Logical decision theory

Survives.

• A [timeless_dt timeless decision agent], even without the updateless feature, will reason, "If-counterfactually my algorithm for what to do in the city had the logical output 'refuse to pay', then in that counterfactual case I would have died in the desert". The TDT agent will therefore evaluate the expected utility of refusing to pay as very low.

• An updateless decision agent computes that the optimal policy maps the sense data "I can see that I'm already in the city" to the action "Pay the driver \$1,000" and this computation does not change after the agent sees that it is in the city.