Prisoner's dilemma

# Prisoner's dilemma

Discussion

Encyclopedia
The prisoner’s dilemma is a canonical example of a game, analyzed in game theory
Game theory
Game theory is a mathematical method for analyzing calculated circumstances, such as in games, where a person’s success is based upon the choices of others...

that shows why two individuals might not cooperate, even if it appears that it is in their best interest to do so. It was originally framed by Merrill Flood and Melvin Dresher
Melvin Dresher
Melvin Dresher was a Polish-born American mathematician, notable for developing, with Merrill Flood, the game theoretical model of cooperation and conflict known as the Prisoner's Dilemma while at RAND in 1950 .Dresher came to the United States in 1923...

working at RAND
RAND
RAND Corporation is a nonprofit global policy think tank first formed to offer research and analysis to the United States armed forces by Douglas Aircraft Company. It is currently financed by the U.S. government and private endowment, corporations including the healthcare industry, universities...

in 1950. Albert W. Tucker
Albert W. Tucker
Albert William Tucker was a Canadian-born American mathematician who made important contributions in topology, game theory, and non-linear programming....

formalized the game with prison sentence payoffs and gave it the "prisoner's dilemma" name (Poundstone, 1992). A classic example of the prisoner's dilemma (PD) is presented as follows:
Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal- if one testifies against his partner (defects / betrays), and the other remains silent (cooperates / assists), the betrayer goes free and the cooperator receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?

If it is supposed here that each player is only concerned with lessening his time in jail, the game becomes a non-zero sum game where the two players may either assist or betray the other. In the game, the sole worry of the prisoners seems to be increasing his own reward. The interesting symmetry of this problem is that the logical decision leads both to betray the other, even though their individual ‘prize’ would be greater if they cooperated.

In the regular version of this game, collaboration is dominated by betraying, and as a result, the only possible outcome of the game is for both prisoners to betray the other. Regardless of what the other prisoner chooses, one will always gain a greater payoff by betraying the other. Because betraying is always more beneficial than cooperating, all objective prisoners would seemingly betray the other.

In the extended form game, the game is played over and over, and consequently, both prisoners continuously have an opportunity to penalize the other for the previous decision. If the number of times the game will be played is known, the finite aspect of the game means that by backward induction, the two prisoners will betray each other repeatedly.

In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games, for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.

## Strategy for the classic prisoners' dilemma

The normal game is shown below:
Prisoner B stays silent (cooperates) Prisoner B confesses (defects)
Prisoner A stays silent (cooperates) Each serves 1 month Prisoner A: 1 year
Prisoner B: goes free
Prisoner A confesses (defects) Prisoner A: goes free
Prisoner B: 1 year
Each serves 3 months

Here, regardless of what the other decides, each prisoner gets a higher pay-off
by betraying the other. For example, Prisoner A can, with close certainty, state that no
matter what prisoner B chooses, prisoner A is better off 'ratting him out' (defecting) than staying silent (cooperating).
As a result, solely for his own benefit, prisoner A should logically betray him. On the other hand, if prisoner B, acts the same way, then they both have acted the same way, and both receive a
lower reward than if both were to stay quiet. Seemingly logical decisions result in both
players being worse off than if each chose to lessen the sentence of his accomplice at the cost of spending more time in jail himself.

Although they are not permitted to communicate, if the prisoners trust each other, they can both rationally choose to remain silent, lessening the penalty for both of them.

## Generalized form

We can expose the framework of the traditional Prisoners’ Dilemma by removing its original prisoner setting, presented as the following:
There are two players and an impartial third party. Each player holds two cards, one with the word ‘collaborate’, and the other with ‘hinder’. Each player gives one card to the third person, thereby getting rid of the possibility of the player’s knowing the other’s decision in advance. At the end of the turn, payments are given based on the cards played.

Based on the rules of a typical understanding of the prisoner’s dilemma, if the two players are represented by colors, red and blue, and the choices made are assigned point values it becomes clear that if the red player plays betrayal and the blue player assists the other, red gets the T prize of 5 points while blue doesn't get payoff at all. If both cooperate they get the R payoff of 3 points each, while if they both betray they get the P payoff of 1 point. The payoffs are shown below.
Cooperate Defect 3, 3 0, 5 5, 0 1, 1

In simple terms, the matrix looks like this:

{| class="wikitable"
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|

The prisoner’s dilemma is a canonical example of a game, analyzed in game theory
Game theory
Game theory is a mathematical method for analyzing calculated circumstances, such as in games, where a person’s success is based upon the choices of others...

that shows why two individuals might not cooperate, even if it appears that it is in their best interest to do so. It was originally framed by Merrill Flood and Melvin Dresher
Melvin Dresher
Melvin Dresher was a Polish-born American mathematician, notable for developing, with Merrill Flood, the game theoretical model of cooperation and conflict known as the Prisoner's Dilemma while at RAND in 1950 .Dresher came to the United States in 1923...

working at RAND
RAND
RAND Corporation is a nonprofit global policy think tank first formed to offer research and analysis to the United States armed forces by Douglas Aircraft Company. It is currently financed by the U.S. government and private endowment, corporations including the healthcare industry, universities...

in 1950. Albert W. Tucker
Albert W. Tucker
Albert William Tucker was a Canadian-born American mathematician who made important contributions in topology, game theory, and non-linear programming....

formalized the game with prison sentence payoffs and gave it the "prisoner's dilemma" name (Poundstone, 1992). A classic example of the prisoner's dilemma (PD) is presented as follows:
Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal- if one testifies against his partner (defects / betrays), and the other remains silent (cooperates / assists), the betrayer goes free and the cooperator receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?

If it is supposed here that each player is only concerned with lessening his time in jail, the game becomes a non-zero sum game where the two players may either assist or betray the other. In the game, the sole worry of the prisoners seems to be increasing his own reward. The interesting symmetry of this problem is that the logical decision leads both to betray the other, even though their individual ‘prize’ would be greater if they cooperated.

In the regular version of this game, collaboration is dominated by betraying, and as a result, the only possible outcome of the game is for both prisoners to betray the other. Regardless of what the other prisoner chooses, one will always gain a greater payoff by betraying the other. Because betraying is always more beneficial than cooperating, all objective prisoners would seemingly betray the other.

In the extended form game, the game is played over and over, and consequently, both prisoners continuously have an opportunity to penalize the other for the previous decision. If the number of times the game will be played is known, the finite aspect of the game means that by backward induction, the two prisoners will betray each other repeatedly.

In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games, for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.

## Strategy for the classic prisoners' dilemma

The normal game is shown below:

{| class="wikitable"
|-
! !! Prisoner B stays silent (cooperates) !! Prisoner B confesses (defects)
|-
! Prisoner A stays silent (cooperates)
| Each serves 1 month|| Prisoner A: 1 year
Prisoner B: goes free
|-
! Prisoner A confesses (defects)
| Prisoner A: goes free
Prisoner B: 1 year || Each serves 3 months
|}

Here, regardless of what the other decides, each prisoner gets a higher pay-off
by betraying the other. For example, Prisoner A can, with close certainty, state that no
matter what prisoner B chooses, prisoner A is better off 'ratting him out' (defecting) than staying silent (cooperating).
As a result, solely for his own benefit, prisoner A should logically betray him. On the other hand, if prisoner B, acts the same way, then they both have acted the same way, and both receive a
lower reward than if both were to stay quiet. Seemingly logical decisions result in both
players being worse off than if each chose to lessen the sentence of his accomplice at the cost of spending more time in jail himself.

Although they are not permitted to communicate, if the prisoners trust each other, they can both rationally choose to remain silent, lessening the penalty for both of them.

## Generalized form

We can expose the framework of the traditional Prisoners’ Dilemma by removing its original prisoner setting, presented as the following:
There are two players and an impartial third party. Each player holds two cards, one with the word ‘collaborate’, and the other with ‘hinder’. Each player gives one card to the third person, thereby getting rid of the possibility of the player’s knowing the other’s decision in advance. At the end of the turn, payments are given based on the cards played.

Based on the rules of a typical understanding of the prisoner’s dilemma, if the two players are represented by colors, red and blue, and the choices made are assigned point values it becomes clear that if the red player plays betrayal and the blue player assists the other, red gets the T prize of 5 points while blue doesn't get payoff at all. If both cooperate they get the R payoff of 3 points each, while if they both betray they get the P payoff of 1 point. The payoffs are shown below.

{| class="wikitable"
|+ Example PD payoff matrix
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|3, 3
|0, 5
|-
!scope="row" style="color: #009"|Defect
|5, 0
|1, 1
|}

In simple terms, the matrix looks like this:

{| class="wikitable"
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|

The prisoner’s dilemma is a canonical example of a game, analyzed in game theory
Game theory
Game theory is a mathematical method for analyzing calculated circumstances, such as in games, where a person’s success is based upon the choices of others...

that shows why two individuals might not cooperate, even if it appears that it is in their best interest to do so. It was originally framed by Merrill Flood and Melvin Dresher
Melvin Dresher
Melvin Dresher was a Polish-born American mathematician, notable for developing, with Merrill Flood, the game theoretical model of cooperation and conflict known as the Prisoner's Dilemma while at RAND in 1950 .Dresher came to the United States in 1923...

working at RAND
RAND
RAND Corporation is a nonprofit global policy think tank first formed to offer research and analysis to the United States armed forces by Douglas Aircraft Company. It is currently financed by the U.S. government and private endowment, corporations including the healthcare industry, universities...

in 1950. Albert W. Tucker
Albert W. Tucker
Albert William Tucker was a Canadian-born American mathematician who made important contributions in topology, game theory, and non-linear programming....

formalized the game with prison sentence payoffs and gave it the "prisoner's dilemma" name (Poundstone, 1992). A classic example of the prisoner's dilemma (PD) is presented as follows:
Two men are arrested, but the police do not possess enough information for a conviction. Following the separation of the two men, the police offer both a similar deal- if one testifies against his partner (defects / betrays), and the other remains silent (cooperates / assists), the betrayer goes free and the cooperator receives the full one-year sentence. If both remain silent, both are sentenced to only one month in jail for a minor charge. If each 'rats out' the other, each receives a three-month sentence. Each prisoner must choose either to betray or remain silent; the decision of each is kept quiet. What should they do?

If it is supposed here that each player is only concerned with lessening his time in jail, the game becomes a non-zero sum game where the two players may either assist or betray the other. In the game, the sole worry of the prisoners seems to be increasing his own reward. The interesting symmetry of this problem is that the logical decision leads both to betray the other, even though their individual ‘prize’ would be greater if they cooperated.

In the regular version of this game, collaboration is dominated by betraying, and as a result, the only possible outcome of the game is for both prisoners to betray the other. Regardless of what the other prisoner chooses, one will always gain a greater payoff by betraying the other. Because betraying is always more beneficial than cooperating, all objective prisoners would seemingly betray the other.

In the extended form game, the game is played over and over, and consequently, both prisoners continuously have an opportunity to penalize the other for the previous decision. If the number of times the game will be played is known, the finite aspect of the game means that by backward induction, the two prisoners will betray each other repeatedly.

In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games, for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.

## Strategy for the classic prisoners' dilemma

The normal game is shown below:

{| class="wikitable"
|-
! !! Prisoner B stays silent (cooperates) !! Prisoner B confesses (defects)
|-
! Prisoner A stays silent (cooperates)
| Each serves 1 month|| Prisoner A: 1 year
Prisoner B: goes free
|-
! Prisoner A confesses (defects)
| Prisoner A: goes free
Prisoner B: 1 year || Each serves 3 months
|}

Here, regardless of what the other decides, each prisoner gets a higher pay-off
by betraying the other. For example, Prisoner A can, with close certainty, state that no
matter what prisoner B chooses, prisoner A is better off 'ratting him out' (defecting) than staying silent (cooperating).
As a result, solely for his own benefit, prisoner A should logically betray him. On the other hand, if prisoner B, acts the same way, then they both have acted the same way, and both receive a
lower reward than if both were to stay quiet. Seemingly logical decisions result in both
players being worse off than if each chose to lessen the sentence of his accomplice at the cost of spending more time in jail himself.

Although they are not permitted to communicate, if the prisoners trust each other, they can both rationally choose to remain silent, lessening the penalty for both of them.

## Generalized form

We can expose the framework of the traditional Prisoners’ Dilemma by removing its original prisoner setting, presented as the following:
There are two players and an impartial third party. Each player holds two cards, one with the word ‘collaborate’, and the other with ‘hinder’. Each player gives one card to the third person, thereby getting rid of the possibility of the player’s knowing the other’s decision in advance. At the end of the turn, payments are given based on the cards played.

Based on the rules of a typical understanding of the prisoner’s dilemma, if the two players are represented by colors, red and blue, and the choices made are assigned point values it becomes clear that if the red player plays betrayal and the blue player assists the other, red gets the T prize of 5 points while blue doesn't get payoff at all. If both cooperate they get the R payoff of 3 points each, while if they both betray they get the P payoff of 1 point. The payoffs are shown below.

{| class="wikitable"
|+ Example PD payoff matrix
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|3, 3
|0, 5
|-
!scope="row" style="color: #009"|Defect
|5, 0
|1, 1
|}

In simple terms, the matrix looks like this:

{| class="wikitable"
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|
win-win
|lose more-win more

|-
!scope="row" style="color: #009"|Defect
|
win more-lose more

|
lose-lose

|}

It is then possible to make general the point values:

{| class="wikitable"
|+ Canonical PD payoff matrix
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|R, R
|S, T
|-
!scope="row" style="color: #009"|Defect
|T, S
|P, P
|}

Where T means the desire to betray, R for the Repayment for total unity, P for the Punishment for total betrayal and S for No reward. To be a prisoner's dilemma, the following must be true:

T > R > P > S

The above form guarantees that the balanced outcome is betrayal, but that collaboration
rules the sense of middle-play. In addition to the above condition, if the game is repeated more than once, the following should be included:

2 R > T + S

If the above is not true, togetherness is not always necessary, as the players are, in actuality, better off by having each player alternate between Cooperatation and Betrayal.

These rules were established by cognitive scientist Douglas Hofstadter
Douglas Richard Hofstadter is an American academic whose research focuses on consciousness, analogy-making, artistic creation, literary translation, and discovery in mathematics and physics...

and form the formal canonical description of a typical game of prisoner's dilemma.

A simple special case occurs when the advantage of defection over cooperation is independent of what the co-player does and cost of the co-player's defection is independent of one's own action, i.e. T+S = P+R.

## The iterated prisoners' dilemma

If two players play prisoners' dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoners' dilemma.

The iterated prisoners' dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoners' dilemma has also been referred to as the "Peace-War game
Peace war game
An iterated game originally played in academic groups and by computer simulation for years to study possible strategies of cooperation and aggression. As peace makers became richer over time it became clear that making war had greater costs than initially anticipated...

".

If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds. The only possible Nash equilibrium
Nash equilibrium
In game theory, Nash equilibrium is a solution concept of a game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his own strategy unilaterally...

is to always defect. The proof is inductive
Mathematical induction
Mathematical induction is a method of mathematical proof typically used to establish that a given statement is true of all natural numbers...

: one might as well defect on the last turn, since the opponent will not have a chance to punish the player. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

Unlike the standard prisoners' dilemma, in the iterated prisoners' dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoners' dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.

For cooperation to emerge between game theoretic rational players, the total number of rounds N must be random, or at least unknown to the players. In this case always defect may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann
Robert Aumann
Robert John Aumann is an Israeli-American mathematician and a member of the United States National Academy of Sciences. He is a professor at the Center for the Study of Rationality in the Hebrew University of Jerusalem in Israel...

in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

### Strategy for the classic prisoners' dilemma

Interest in the iterated prisoners' dilemma (IPD) was kindled by Robert Axelrod
Robert Axelrod
Robert M. Axelrod is an American political scientist. He is Professor of Political Science and Public Policy at the University of Michigan where he has been since 1974. He is best known for his interdisciplinary work on the evolution of cooperation, which has been cited in numerous articles...

in his book The Evolution of Cooperation
The Evolution of Cooperation
The evolution of cooperation can refer to:* the study of how cooperation can emerge and persist as elucidated by application of game theory,* a 1981 paper by political scientist Robert Axelrod and evolutionary biologist W. D...

(1984). In it he reports on a tournament he organized of the N step prisoners' dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic
Altruism
Altruism is a concern for the welfare of others. It is a traditional virtue in many cultures, and a core aspect of various religious traditions, though the concept of 'others' toward whom concern should be directed can vary among cultures and religions. Altruism is the opposite of...

strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

.

The best deterministic
Deterministic algorithm
In computer science, a deterministic algorithm is an algorithm which, in informal terms, behaves predictably. Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states...

strategy was found to be tit for tat
Tit for tat
Tit for tat is an English saying meaning "equivalent retaliation". It is also a highly effective strategy in game theory for the iterated prisoner's dilemma. It was first introduced by Anatol Rapoport in Robert Axelrod's two tournaments, held around 1980. An agent using this strategy will initially...

, which Anatol Rapoport
Anatol Rapoport
Anatol Rapoport was a Russian-born American Jewish mathematical psychologist. He contributed to general systems theory, mathematical biology and to the mathematical modeling of social interaction and stochastic models of contagion.-Biography:...

developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness." When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

Nice: The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first.
Retaliating: However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.
Forgiving: Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points.
Non-envious: The last quality is being non-envious, that is not striving to score more than the opponent (note that a "nice" strategy can never score more than the opponent).

The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.

A strategy called Pavlov (an example of Win-Stay, Lose-Switch
Win-Stay, Lose-Switch
In psychology, game theory, statistics, and machine learning, Win-Stay, Lose-Switch is a learning strategy used to model learning in decision situations. It was first invented as an improvement over randomization in bandit problems...

) cooperates at the first iteration and whenever the player and co-player did the same thing at the previous iteration; Pavlov defects when the player and co-player did different things at the previous iteration. For a certain range of parameters, Pavlov beats all other strategies by giving preferential treatment to co-players which resemble Pavlov.

Deriving the optimal strategy is generally done in two ways:
1. Bayesian Nash Equilibrium: If the statistical distribution of opposing strategies can be determined (e.g. 50% tit for tat, 50% always cooperate) an optimal counter-strategy can be derived analytically.
2. Monte Carlo
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm
Genetic algorithm
A genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...

for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988), but there is no analytic proof that this will always occur.

Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings http://www.ecs.soton.ac.uk/~nrj and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers, Perukrishnen Vytelingum) introduced a new strategy at the 20th-anniversary iterated prisoners' dilemma competition, which proved to be more successful than tit for tat. This strategy relied on cooperation between programs to achieve the highest number of points for a single program. The University submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

This strategy takes advantage of the fact that multiple entries were allowed in this particular competition, and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analysing single agent strategies as compared to Axelrod's seminal tournament. However, it provided the framework for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Richard Dawkins
Richard Dawkins
Clinton Richard Dawkins, FRS, FRSL , known as Richard Dawkins, is a British ethologist, evolutionary biologist and author...

in his book The Selfish Gene
The Selfish Gene
The Selfish Gene is a book on evolution by Richard Dawkins, published in 1976. It builds upon the principal theory of George C. Williams's first book Adaptation and Natural Selection. Dawkins coined the term "selfish gene" as a way of expressing the gene-centred view of evolution as opposed to the...

pointed out the possibility of such strategies winning if multiple entries were allowed, but remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoners' dilemma in that there is no communication allowed between the two players. When the Southampton programs engage in an opening "ten move dance" to recognize one another, this only reinforces just how valuable communication can be in shifting the balance of the game.

### Continuous iterated prisoners' dilemma

Most work on the iterated prisoners' dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoners' dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoners' dilemma. The basic intuition for this result is straightforward: in a continuous prisoners' dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoners' dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoners' dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein) even though tit for tat seems robust in theoretical models.

### Morality

While it is sometimes thought that morality
Morality
Morality is the differentiation among intentions, decisions, and actions between those that are good and bad . A moral code is a system of morality and a moral is any one practice or teaching within a moral code...

must involve the constraint of self-interest, David Gauthier
David Gauthier
David Gauthier is a Canadian-American philosopher best known for his neo-Hobbesian social contract theory of morality, as laid out in his book Morals by Agreement.-Biography:...

famously argues that co-operating in the prisoners' dilemma on moral principles is consistent with self-interest and the axioms of game theory. In his opinion, it is most prudent to give up straight-forward maximizing and instead adopt a disposition of constrained maximization, according to which one resolves to cooperate in the belief that the opponent will respond with the same choice, while in the classical PD it is explicitly stipulated that the response of the opponent does not depend on the player's choice. This form of contractarianism claims that good moral thinking is just an elevated and subtly strategic version of basic means-end reasoning.

Douglas Richard Hofstadter is an American academic whose research focuses on consciousness, analogy-making, artistic creation, literary translation, and discovery in mathematics and physics...

expresses a strong personal belief that the mathematical symmetry is reinforced by a moral symmetry, along the lines of the Kant
KANT
KANT is a computer algebra system for mathematicians interested in algebraic number theory, performing sophisticated computations in algebraic number fields, in global function fields, and in local fields. KASH is the associated command line interface...

ian categorical imperative
Categorical imperative
The Categorical Imperative is the central philosophical concept in the moral philosophy of Immanuel Kant, as well as modern deontological ethics...

: defecting in the hope that the other player cooperates is morally indefensible. If players treat each other as they would treat themselves, then they will cooperate.

## Real-life examples

These particular examples, involving prisoners and bag switching and so forth, may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

, politics
Politics
Politics is a process by which groups of people make collective decisions. The term is generally applied to the art or science of running governmental or state affairs, including behavior within civil governments, but also applies to institutions, fields, and special interest groups such as the...

and sociology
Sociology
Sociology is the study of society. It is a social science—a term with which it is sometimes synonymous—which uses various methods of empirical investigation and critical analysis to develop a body of knowledge about human social activity...

, as well as to the biological sciences such as ethology
Ethology
Ethology is the scientific study of animal behavior, and a sub-topic of zoology....

and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

### In politics

In political science
Political science
Political Science is a social science discipline concerned with the study of the state, government and politics. Aristotle defined it as the study of the state. It deals extensively with the theory and practice of politics, and the analysis of political systems and political behavior...

, for instance, the PD scenario is often used to illustrate the problem of two states engaged in an arms race
Arms race
The term arms race, in its original usage, describes a competition between two or more parties for the best armed forces. Each party competes to produce larger numbers of weapons, greater armies, or superior military technology in a technological escalation...

. Both will reason that they have two options, either to increase military expenditure or to make an agreement to reduce weapons. Either state will benefit from military expansion regardless of what the other state does; therefore, they both incline towards military expansion. The paradox
Similar to Circular reasoning, A paradox is a seemingly true statement or group of statements that lead to a contradiction or a situation which seems to defy logic or intuition...

is that both states are acting rational
Rationality
In philosophy, rationality is the exercise of reason. It is the manner in which people derive conclusions when considering things deliberately. It also refers to the conformity of one's beliefs with one's reasons for belief, or with one's actions with one's reasons for action...

ly, but producing an apparently irrational result. This could be considered a corollary
Corollary
A corollary is a statement that follows readily from a previous statement.In mathematics a corollary typically follows a theorem. The use of the term corollary, rather than proposition or theorem, is intrinsically subjective...

to deterrence theory
Deterrence theory
Deterrence theory gained increased prominence as a military strategy during the Cold War with regard to the use of nuclear weapons, and features prominently in current United States foreign policy regarding the development of nuclear technology in North Korea and Iran. Deterrence theory however was...

.

### In environmental studies

In environmental studies
Environmental studies
Environmental studies is the academic field which systematically studies human interaction with the environment. It is a broad interdisciplinary field of study that includes the natural environment, built environment, and the sets of relationships between them...

, the PD is evident in crises such as global climate change
Climate change
Climate change is a significant and lasting change in the statistical distribution of weather patterns over periods ranging from decades to millions of years. It may be a change in average weather conditions or the distribution of events around that average...

. All countries will benefit from a stable climate, but any single country is often hesitant to curb
Carbon dioxide
Carbon dioxide is a naturally occurring chemical compound composed of two oxygen atoms covalently bonded to a single carbon atom...

emissions. The immediate benefit to an individual country to maintain current behavior is perceived to be greater than the eventual benefit to all countries if behavior was changed, therefore explaining the current impasse concerning climate change.

### In psychology

Historically, addiction has been defined as physical and psychological dependence on psychoactive substances which cross the blood-brain barrier once ingested, temporarily altering the chemical milieu of the brain.Addiction can also be viewed as a continued involvement with a substance or activity...

research/behavioral economics, George Ainslie
George Ainslie (psychologist)
George W. Ainslie is an American psychiatrist, psychologist and behavioral economist.Unusually for a psychiatrist, Ainslie undertook experimental animal research in operant conditioning, under the guidance of Howard Rachlin...

points out that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome, and that defecting both today and in the future is the worst outcome. The case where one abstains today but relapses in the future is clearly a bad outcome—in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

### In economics

Stag hunt
In game theory, the stag hunt is a game which describes a conflict between safety and social cooperation. Other names for it or its variants include "assurance game", "coordination game", and "trust dilemma". Jean-Jacques Rousseau described a situation in which two individuals go out on a hunt. ...

. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the creation of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry. This analysis is likely to be pertinent in many other business situations involving advertising.

Without enforceable agreements, members of a cartel
Cartel
A cartel is a formal agreement among competing firms. It is a formal organization of producers and manufacturers that agree to fix prices, marketing, and production. Cartels usually occur in an oligopolistic industry, where there is a small number of sellers and usually involve homogeneous products...

are also involved in a (multi-player) prisoners' dilemma. 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly stealing business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers
Consumer
Consumer is a broad label for any individuals or households that use goods generated within the economy. The concept of a consumer occurs in different contexts, so that the usage and significance of the term may vary.-Economics and marketing:...

.

### In law

The theoretical conclusion of PD is one reason why, in many countries, plea bargain
Plea bargain
A plea bargain is an agreement in a criminal case whereby the prosecutor offers the defendant the opportunity to plead guilty, usually to a lesser charge or to the original criminal charge with a recommendation of a lighter than the maximum sentence.A plea bargain allows criminal defendants to...

ing is forbidden. Often, precisely the PD scenario applies: it is in the interest of both suspects to confess and testify against the other prisoner/suspect, even if each is innocent of the alleged crime.

### Multiplayer dilemmas

Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's
Garrett Hardin
Garrett James Hardin was an American ecologist who warned of the dangers of overpopulation and whose concept of the tragedy of the commons brought attention to "the damage that innocent actions by individuals can inflict on the environment"...

tragedy of the commons
Tragedy of the commons
The tragedy of the commons is a dilemma arising from the situation in which multiple individuals, acting independently and rationally consulting their own self-interest, will ultimately deplete a shared limited resource, even when it is clear that it is not in anyone's long-term interest for this...

may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). The commons are not always exploited: William Poundstone
William Poundstone
William Poundstone is an American author, columnist, and skeptic. He has written a number of books including the Big Secrets series and a biography of Carl Sagan...

, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom
Elinor Ostrom
Elinor Ostrom is an American political economist. She was awarded the 2009 Nobel Memorial Prize in Economic Sciences, which she shared with Oliver E. Williamson, for "her analysis of economic governance, especially the commons." She was the first, and to date, the only woman to win the prize in...

, winner of the 2009 Nobel Prize in Economics, proved that the tragedy of the commons
Tragedy of the commons
The tragedy of the commons is a dilemma arising from the situation in which multiple individuals, acting independently and rationally consulting their own self-interest, will ultimately deplete a shared limited resource, even when it is clear that it is not in anyone's long-term interest for this...

is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximun good for the group, an example of effecting the best case outcome for PD.

### Closed-bag exchange

Douglas Richard Hofstadter is an American academic whose research focuses on consciousness, analogy-making, artistic creation, literary translation, and discovery in mathematics and physics...

once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was "closed bag exchange":
Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.

In this game, defection is always the best course, implying that rational agents will never play. However, in this case both players cooperating and both players defecting actually give the same result, assuming there are no gains from trade
Gains from trade in economics refers to net benefits to agents from allowing an increase in voluntary trading with each other. In technical terms, it is the increase of consumer surplus plus producer surplus from lower tariffs or otherwise liberalizing trade...

, so chances of mutual cooperation, even in repeated games, are few.

### Friend or Foe?

Friend or Foe?
Friend or Foe?
Friend or Foe? is an American game show based on knowledge and trust which aired on Game Show Network. Three teams of two strangers attempted to persuade their partner into sharing their accumulated winnings rather than stealing it for themselves....

is a game show that aired from 2002 to 2005 on the Game Show Network
Game Show Network
The Game Show Network is an American cable television and direct broadcast satellite channel dedicated to game shows and casino game shows. The channel was launched on December 1, 1994. Its current slogan is "The World Needs More Winners"...

in the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...

. It is an example of the prisoner's dilemma game tested by real people, but in an artificial setting. On the game show, three pairs of people compete. As each pair is eliminated, it plays a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If you know your opponent is going to vote Foe, then your choice does not affect your winnings. In a certain sense, Friend or Foe has a payoff model between prisoner's dilemma and the game of Chicken
Chicken (game)
The game of chicken, also known as the hawk-dove or snowdrift game, is an influential model of conflict for two players in game theory...

.

The payoff matrix is
{| class="wikitable"
|
!scope="col" style="color: #900"|Cooperate
!scope="col" style="color: #900"|Defect
|-
!scope="row" style="color: #009"|Cooperate
|1, 1
|0, 2
|-
!scope="row" style="color: #009"|Defect
|2, 0
|0, 0
|}

This payoff matrix was later used on the British
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...

television
Television
Television is a telecommunication medium for transmitting and receiving moving images that can be monochrome or colored, with accompanying sound...

programmes Shafted
Shafted
Shafted was a British quiz show on ITV, presented by Robert Kilroy-Silk, based on game theory.- Format :The quiz begins with six players. In the first round each must declare how much money they would like. This is important as a lot of money is needed to bet on questions during the show...

and Golden Balls
Golden Balls
Golden Balls was a British daytime game show on the ITV network, presented by Jasper Carrott. It was filmed at the BBC Television Centre. From 25 February 2008 to 13 February 2009, the show was sponsored by ITV Bingo ; and from 2 November to 18 December 2009, the show was sponsored by Carpet Right...

. The latter show has been analyzed by a team of economists. See: Split or Steal? Cooperative Behavior When the Stakes are Large.

It was also used earlier in the UK Channel 4
Channel 4
Channel 4 is a British public-service television broadcaster which began working on 2 November 1982. Although largely commercially self-funded, it is ultimately publicly owned; originally a subsidiary of the Independent Broadcasting Authority , the station is now owned and operated by the Channel...

gameshow Trust Me, hosted by Nick Bateman
Nick Bateman
Nicholas "Nasty Nick" Bateman is a British media personality and a former contestant on the first series of the British version of Big Brother.-Big Brother:...

, in 2000.

• Centipede game
Centipede game
In game theory, the centipede game, first introduced by Rosenthal , is an extensive form game in which two players take turns choosing either to take a slightly larger share of a slowly increasing pot, or to pass the pot to the other player...

• Christmas truce
Christmas truce
Christmas truce was a series of widespread unofficial ceasefires that took place along the Western Front around Christmas of 1914, during the First World War...

• Cooperation
Cooperation
Cooperation or co-operation is the process of working or acting together. In its simplest form it involves things working in harmony, side by side, while in its more complicated forms, it can involve something as complex as the inner workings of a human being or even the social patterns of a...

• Diner's dilemma
Diner's dilemma
In game theory, the Unscrupulous diner's dilemma is an n-player prisoner's dilemma. The situation imagined is that several individuals go out to eat, and prior to ordering they agree to split the check equally between all of them. Each individual must now choose whether to order the expensive or...

• Ethical dilemma
Ethical dilemma
An Ethical dilemma is a complex situation that will often involve an apparent mental conflict between moral imperatives, in which to obey one would result in transgressing another....

• Evolutionarily stable strategy
Evolutionarily stable strategy
In game theory and behavioural ecology, an evolutionarily stable strategy , which is sometimes also called an evolutionary stable strategy, is a strategy which, if adopted by a population of players, cannot be invaded by any alternative strategy that is initially rare. An ESS is an equilibrium...

• Folk theorem (game theory)
Folk theorem (game theory)
In game theory, folk theorems are a class of theorems which imply that in repeated games, any outcome is a feasible solution concept, if under that outcome the players' minimax conditions are satisfied. The minimax condition states that a player will minimize the maximum possible loss which he...

• Nash equilibrium
Nash equilibrium
In game theory, Nash equilibrium is a solution concept of a game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his own strategy unilaterally...

• Prisoner's dilemma and cooperation an experimental study
• Public choice theory
Public choice theory
In economics, public choice theory is the use of modern economic tools to study problems that traditionally are in the province of political science...

• Public goods game
Public goods game
The Public goods game is a standard of experimental economics. In the basic game, subjects secretly choose how many of their private tokens to put into the public pot. The tokens in the pot are multiplied by a factor and this "public good" payoff is evenly divided among players...

• Reciprocal altruism
Reciprocal altruism
In evolutionary biology, reciprocal altruism is a behaviour whereby an organism acts in a manner that temporarily reduces its fitness while increasing another organism's fitness, with the expectation that the other organism will act in a similar manner at a later time...

• Rendezvous problem
Rendezvous problem
The rendezvous dilemma can be formulated in this way:If they both choose to wait, of course, they will never meet. If they both choose to walk there are chances that they meet and chances that they do not...

• Simultaneous action selection
Simultaneous action selection
Simultaneous action selection, or SAS, is a game mechanic that occurs when players of a game take action at the same time. An example of a game that uses this type of movement is Diplomacy...

• Social trap
Social trap
Social trap is a term used by psychologists to describe a situation in which a group of people act to obtain short-term individual gains, which in the long run leads to a loss for the group as a whole...

• Tit for tat
Tit for tat
Tit for tat is an English saying meaning "equivalent retaliation". It is also a highly effective strategy in game theory for the iterated prisoner's dilemma. It was first introduced by Anatol Rapoport in Robert Axelrod's two tournaments, held around 1980. An agent using this strategy will initially...

• Tragedy of the commons
Tragedy of the commons
The tragedy of the commons is a dilemma arising from the situation in which multiple individuals, acting independently and rationally consulting their own self-interest, will ultimately deplete a shared limited resource, even when it is clear that it is not in anyone's long-term interest for this...

• Traveler's dilemma
Traveler's dilemma
In game theory, the traveler's dilemma is a type of non-zero-sum game in which two players attempt to maximize their own payoff, without any concern for the other player's payoff....

• War of attrition (game)
War of attrition (game)
In game theory, the war of attrition is a model of aggression in which two contestants compete for a resource of value V by persisting while constantly accumulating costs over the time t that the contest lasts. The model was originally formulated by John Maynard Smith, a mixed evolutionary stable...

• Zero-sum
Zero-sum
In game theory and economic theory, a zero-sum game is a mathematical representation of a situation in which a participant's gain of utility is exactly balanced by the losses of the utility of other participant. If the total gains of the participants are added up, and the total losses are...