|
- The iterated prisoners' dilemma (IPD) is a
well known non-zero-sum game[1].
- There are two possible moves:
(i) cooperate (C), and
(ii) defect (D).
- The two players make their moves simultaneously,
neither knowing the other's move in advance.
- Payoff matrix:
|
player2 |
C |
D |
player1 |
C |
3, 3 |
0, 5 |
D |
5, 0 |
1, 1 |
-
- Consider player1.
If player2 cooperates, player1 gains more (5 v. 3) by defecting, and
if player2 defects, player1 still gains more (1 v. 0) by defecting.
The same considerations apply to player2.
So it seems that the best strategy is to defect.
Yet, when many players play repeatedly, cooperation wins out!
-
- Classic Strategies
-
softy: |
Always cooperates -- and is easily taken advantage of, 0:5, by a defector.
|
baddy: |
Always defects.
|
tit for tat: |
Initially cooperates then does as was done unto it
on the previous move -- an eye for an eye.
|
mutually assured destruction (MAD): |
Initially cooperates, but consistently defects if
the opponent ever defects.
|
-
- If several copies of the above strategies all play each other,
without collusion, the clear winner is tit for tat.
- (If there are many softies about,
it can be worth testing an opponent with an early defect,
but this is dangerous in the presence of MADs.)
-
- Evolution
-
- Strategies can also be evolved.
- The game is played over a number of generations.
- The initial generation is made up of players where each one
has a random strategy.
- For each generation, the scores are initialised to zero,
and then every players plays every other player in an
iterated prisoners' dilemma for a certain number of rounds.
- During these rounds,
each player remembers a certain amount of past history
for each opponent, being the last few moves made by it and the opponent, and
acts according to a table (its genome) indexed
by the sequence of recent past interactions.
- At the end of a generation, the players are sorted according
to their total earnings.
Parents are selected to produce offspring to make up the next generation,
the selection being biased towards high earning parents.
- A child is produced by single cross-over, with
a certain probability of mutation for each position in the genome, say
(that is, a genetic algorithm).
Simulation
- The initial, random players, are clueless and easily
taken advantage of by any player who regularly defects.
- But, if a number of players start to cooperate
they can beat defectors, and proliferate.
- However, in a population full of cooperators,
there is no pressure to remain vigilant,
so the ability to punish defectors may be lost,
allowing defectors (which can arise by chance) to again spread.
- Conditions may oscillate.
-
- Just what happens varies from run to run.
- Extreme, trapped behaviours are possible,
especially for a small population.
- A low probability of mutation may mean that the
population is slow to "learn" to cooperate.
- But a higher probability of mutation
may make it impossible for cooperation to become established.
-
- If
the remembered history is at least one move,
the population is large enough,
the number of generations is high enough,
the number of rounds per generation is high enough, and
the probability of mutation is not too high,
players that cooperate with each other and are able to punish defectors
almost always come to dominate.
- This has been used as a possible explanation
for the existence of altruistic behaviour in the natural world.
-
- Many variations on the IPD have been studied including
but not limited to:
- Geography: Players can choose to move around in a world, and
interact with those they are close to and "like".
- Noise: Moves may be changed at random (misunderstood) with
a small probability; a successful strategy must be "forgiving".
Reading
- [1] Search for [prisoners dilemma] in
the [Bib].
|
|