Iterated Prisoners' Dilemma

The iterated prisoners' dilemma (IPD) is a well known non-zero-sum game^[1].

There are two possible moves: (i) cooperate (C), and (ii) defect (D).

The two players make their moves simultaneously, neither knowing the other's move in advance.

Payoff matrix:

		player2
		C	D
player1	C	3, 3	0, 5
player1	D	5, 0	1, 1

Consider player1. If player2 cooperates, player1 gains more (5 v. 3) by defecting, and if player2 defects, player1 still gains more (1 v. 0) by defecting. The same considerations apply to player2. So it seems that the best strategy is to defect. Yet, when many players play repeatedly, cooperation wins out!

Classic Strategies

softy:	Always cooperates -- and is easily taken advantage of, 0:5, by a defector.
baddy:	Always defects.
tit for tat:	Initially cooperates then does as was done unto it on the previous move -- an eye for an eye.
mutually assured destruction (MAD):	Initially cooperates, but consistently defects if the opponent ever defects.

If several copies of the above strategies all play each other, without collusion, the clear winner is tit for tat.

(If there are many softies about, it can be worth testing an opponent with an early defect, but this is dangerous in the presence of MADs.)

Evolution

Strategies can also be evolved.

The game is played over a number of generations.

The initial generation is made up of players where each one has a random strategy.

For each generation, the scores are initialised to zero, and then every players plays every other player in an iterated prisoners' dilemma for a certain number of rounds.

During these rounds, each player remembers a certain amount of past history for each opponent, being the last few moves made by it and the opponent, and acts according to a table (its genome) indexed by the sequence of recent past interactions.

At the end of a generation, the players are sorted according to their total earnings. Parents are selected to produce offspring to make up the next generation, the selection being biased towards high earning parents.

A child is produced by single cross-over, with a certain probability of mutation for each position in the genome, say (that is, a genetic algorithm).

Simulation

The initial, random players, are clueless and easily taken advantage of by any player who regularly defects.
But, if a number of players start to cooperate they can beat defectors, and proliferate.
However, in a population full of cooperators, there is no pressure to remain vigilant, so the ability to punish defectors may be lost, allowing defectors (which can arise by chance) to again spread.
Conditions may oscillate.
Just what happens varies from run to run.
Extreme, trapped behaviours are possible, especially for a small population.
A low probability of mutation may mean that the population is slow to "learn" to cooperate.
But a higher probability of mutation may make it impossible for cooperation to become established.
If the remembered history is at least one move, the population is large enough, the number of generations is high enough, the number of rounds per generation is high enough, and the probability of mutation is not too high, players that cooperate with each other and are able to punish defectors almost always come to dominate.
This has been used as a possible explanation for the existence of altruistic behaviour in the natural world.: -- L.A., 9/2012.
Many variations on the IPD have been studied including but not limited to:: Geography: Players can choose to move around in a world, and interact with those they are close to and "like".; Noise: Moves may be changed at random (misunderstood) with a small probability; a successful strategy must be "forgiving".

Reading

[1] Search for [prisoners dilemma] in the [Bib].