True Prisoner's Dilemma

https://arbital.com/p/true_prisoners_dilemma

by Eliezer Yudkowsky Aug 1 2016 updated Oct 16 2016

A scenario that would reproduce the ideal payoff matrix of the Prisoner's Dilemma about human beings who care about their public reputation and each other.


[summary: Many people object to the original Prisoner's Dilemma on the grounds that real human beings care about their future reputations, care about other people, feel an explicit desire to act honorably, and so on. A "True Prisoner's Dilemma" is meant to reproduce the ideal payoff matrix of the Prisoner's Dilemma in a way that answers some or all of these objections. Examples include:

A "True Prisoner's Dilemma" is a scenario intended to reproduce the ideal payoff matrix of the Prisoner's Dilemma for realistic human beings that care about, e.g., their public reputation, and the welfare of other human beings. (For example, in the traditional presentation of the Prisoner's Dilemma as a question of whether a prisoner under arrest should testify against their fellow prisoner, somebody might reply, "I care about my confederate" or "I wouldn't want to get a reputation among my fellow criminals for turning traitor.")

Examples

The two charities

Two charities are in private talks with a potential donor, Mr. BadRich, who is considering donating roughly \$10M to each of them.

Mr. BadRich made his money selling collateralized debt obligations, that he knew were going to implode, to pension funds. Mr. BadRich will spend all of the money not given to charity on speedboats. There is no moral level on which we prefer Mr. BadRich to end up with more money at the end of this dilemma. We don't feel strongly that we are obliged to share with Mr. BadRich every possible piece of info that might lead to him donating less.

The values and/or beliefs of the chief executives of both charities, are such as to make each executive honestly consider that money given to them is much more valuable than money given to the other charity. For example, the two charities could be the Against Malaria Foundation and Effective Animal Altruists:

Sometime this week, Mr. BadRich is separately interviewing the chief executives of the EAA and AMF. It seems quite likely that nobody else will ever hear what is said in these confidential discussions.

Both the CEO of the EAA, and the CEO of the AMF, know some damaging facts about the other organization--say, a piece of mismanagement by an employee who was since fired, or a lawsuit that was settled out of court. If one CEO describes what they know about the organization, Mr. BadRich will donate less to the other organization, and some of the money freed up will go to theirs. If both CEOs describe what they know of the other organization, Mr. BadRich will donate less to both.

Mr. BadRich behaves randomly to some extent, and wasn't going to donate exactly equal amounts to both organizations in any case, so the CEOs can't figure out for sure what happened just by looking at the outcomes.

At the point where this dilemma occurs, both CEOs know it has happened or will happen with the other CEO; but neither CEO has previously made any promise to the other not to discuss this true information that they know about the other organization with the other organization's potential donors.

The expected outcomes:

The disagreeing doctors

You are one of two doctors dealing with a malaria epidemic in your village. At least, you think it's a malaria epidemic. The other doctor thinks it's an outbreak of bird flu. In your opinion, the other doctor is a stubborn fool and you do not think that Aumann's Agreement Theorem calls for you to update on their opinion. The other doctor has no idea what Aumann's Agreement Theorem is, doesn't want to hear about it, and thinks that all this talk of probability theory is silly math stuff; which, from your viewpoint, tends to confirm your own decision there.

Each of you is in contact with one of two medical suppliers, both of whom, for insurance reasons, can only sell to one doctor but not the other. As it so happens, the supplier who can sell to you charges \$200 per unit of malaria medication and \$100 per unit of bird-flu medication. The other supplier charges \$100 per unit of malaria medication and \$200 per unit of bird-flu medication. (This being totally realistic for a US-style medical system.)

Each of you has \$50,000 to spend on medicine with your supplier. You have no way to verify what order was actually made, except the other's word. (The supplier only communicates via email, and it would be easy to fake an email showing a false invoice.)

Once the medicine actually arrives, it will rapidly become clear who was right about the real cause of the epidemic. Both of you fully expect that, if you Defect and then the other doctor proves to be wrong, they will embrace you and thank you for doing the right thing despite their own blindness.

Do you order malaria medication from your supplier, or bird-flu medication?

Humans and paperclip maximizer

(Not quite as pseudo-realistic, but included for historical reasons as it was the first scenario advertised as a "True Prisoner's Dilemma".)

The human species is bargaining with a Paperclip maximizer. Five billion human beings, not the whole human species, but a significant part of it, are progressing through a fatal disease that can only be cured by Substance S.

Substance S can also be used to make paperclips.

You and the paperclip maximizer must cooperate in order to produce Substance S, but you can both steal some of the Substance S from the production line at the cost of reducing the total amount produced.

The payoff matrix is as follows:

The paperclip maximizer has no sense of honor or reputation built into its utility function, which is only over the number of paperclips in the universe. It has no subjective experiences and will not feel betrayed if you betray it; it will simply end up with fewer paperclips. In fact, shortly after this, the Paperclip maximizer will disassemble itself to make paperclips and leave its remaining work to automatic machinery; it will never even find out how many paperclips were actually produced from Substance S.

This version of the dilemma was intended to make sure that the reader really, truly preferred their payoff from Defect, Cooperate to their payoff from Cooperate, Cooperate, even taking into account any ideals they had about the Prisoner's Dilemma. But the setup retains the question of what a 'rational' agent does in this situation; the fact that the paperclip maximizer is also rational; and that both agents prefer the Cooperate, Cooperate outcome to the Defect, Defect outcome.

Dinner date and unspoken choices.

Two people, each of whom is a member of an appropriate gender for the other, are on a one-time dinner date. One of them is visiting from out-of-town, and is unlikely to return to that particular city.