前言

啊哈，我又来更新了，原本是想结束了来着，但是又找到了几个好玩的题，所以来更新一下，顺便把之前欠的补了，完整代码见 https://github.com/zong4/AILearning。

石头剪刀布

不知道当你看到这个题目的时候想到了什么策略，总之强化学习肯定是用不了的，因为纯概率游戏是没有最优策略的，给大家看看我写的三个策略。

策略1

出能赢对手上次出的那一手。

# Strategy1: play the winning move of the opponent's last move
if prev_play == 'R':
    return 'P'
elif prev_play == 'P':
    return 'S'
elif prev_play == 'S':
    return 'R'
else:
    return random.choice(['R', 'P', 'S'])

策略2

出能赢对手最容易出的那一手。

# Strategy2: play the winning move of the opponent's most frequent move
if opponent_history.count('R') > opponent_history.count('P') and opponent_history.count('R') > opponent_history.count('S'):
    return 'P'
elif opponent_history.count('P') > opponent_history.count('R') and opponent_history.count('P') > opponent_history.count('S'):
    return 'S'
elif opponent_history.count('S') > opponent_history.count('R') and opponent_history.count('S') > opponent_history.count('P'):
    return 'R'
else:
    return random.choice(['R', 'P', 'S'])

策略3

用马尔可夫链预测对手最容易出的。

# Strategy3: play the winning move through markov chain
r2r_count = opponent_history.count('RR')
r2p_count = opponent_history.count('RP')
r2s_count = opponent_history.count('RS')

p2p_count = opponent_history.count('PP')
p2r_count = opponent_history.count('PR')
p2s_count = opponent_history.count('PS')

s2s_count = opponent_history.count('SS')
s2r_count = opponent_history.count('SR')
s2p_count = opponent_history.count('SP')

if prev_play == 'R':
    if r2r_count > r2p_count and r2r_count > r2s_count:
        return 'R'
    elif r2p_count > r2r_count and r2p_count > r2s_count:
        return 'P'
    elif r2s_count > r2r_count and r2s_count > r2p_count:
        return 'S'
    else:
        return random.choice(['R', 'P', 'S'])
    
elif prev_play == 'P':
    if p2p_count > p2r_count and p2p_count > p2s_count:
        return 'P'
    elif p2r_count > p2p_count and p2r_count > p2s_count:
        return 'R'
    elif p2s_count > p2p_count and p2s_count > p2r_count:
        return 'S'
    else:
        return random.choice(['R', 'P', 'S'])
    
elif prev_play == 'S':
    if s2s_count > s2r_count and s2s_count > s2p_count:
        return 'S'
    elif s2r_count > s2s_count and s2r_count > s2p_count:
        return 'R'
    elif s2p_count > s2s_count and s2p_count > s2r_count:
        return 'P'
    else:
        return random.choice(['R', 'P', 'S'])
    
else:
    return random.choice(['R', 'P', 'S'])

上面这些应该都挺简单的，大家随便过一下就行，来看下一趴。

人类的本质是复读机

不知道大家有没有听过这句话，“人的本质是复读机”，我们今天就来模拟一下。

生存游戏

因为是模拟人类社会，所以这个生存游戏肯定得有合作有竞争，所以就使用比较经典的囚徒困境就行了。

# Define the pay - off matrix for the Prisoner's Dilemma
PAYOFF_MATRIX = {
    ('cooperate', 'cooperate'): (3, 3),
    ('cooperate', 'defect'): (-1, 3),
    ('defect', 'cooperate'): (3, -1),
    ('defect', 'defect'): (-3, -3)
}

生存策略

这边的话就先来四个策略，第三个就是所谓的复读机（重复遇到的上一个玩家的策略），最后应该会再放一个 AI 进去学。

# Define different strategies
def always_cooperate(history):
    return 'cooperate'

def always_defect(history):
    return 'defect'

def tit_for_tat(history):
    if not history:
        return 'cooperate'
    return history[-1][1]

def random_choice(history):
    return random.choice(['cooperate', 'defect'])

# Player class
class Player:
    def __init__(self, strategy):
        self.strategy = strategy
        self.score = 10
        self.history = []

    def make_choice(self):
        return self.strategy(self.history)

    def update_score(self, payoff):
        self.score += payoff

    def update_history(self, own_choice, other_choice):
        self.history.append((own_choice, other_choice))

模拟

因为是生存模拟，所以肯定得淘汰低于0分的。

# Function to run the simulation
def run_simulation(num_players_per_strategy, num_rounds):
    strategies = [always_cooperate, always_defect, tit_for_tat, random_choice]
    players = []

    # Create players with equal distribution of strategies
    for strategy in strategies:
        for _ in range(num_players_per_strategy):
            players.append(Player(strategy))

    for round_num in range(num_rounds):
        print("round_num: " + str(round_num))

        random.shuffle(players)
        for i in range(0, len(players), 2):
            if i + 1 < len(players):
                play_round(players[i], players[i + 1])

        # Remove players with score <= 0
        players = [player for player in players if player.score > 0]

        always_cooperate_count = sum(1 for player in players if player.strategy == always_cooperate)
        always_defect_count = sum(1 for player in players if player.strategy == always_defect)
        tit_for_tat_count = sum(1 for player in players if player.strategy == tit_for_tat)
        random_choice_count = sum(1 for player in players if player.strategy == random_choice)

        print("always_cooperate: " + str(always_cooperate_count))
        print("always_defect: " + str(always_defect_count))
        print("tit_for_tat: " + str(tit_for_tat_count))
        print("random_choice: " + str(random_choice_count))
        print()

优化1

看一下这个结果，总是合作的竟然一个都没被淘汰，说明囚徒困境的得分设置的不合理。

round_num: 9999
always_cooperate: 10
always_defect: 6
tit_for_tat: 10
random_choice: 8

可以调整成下面这样，这样的话对于任何策略的得分期望就是0了。

PAYOFF_MATRIX = {
    ('cooperate', 'cooperate'): (3, 3),
    ('cooperate', 'defect'): (-3, 3),
    ('defect', 'cooperate'): (3, -3),
    ('defect', 'defect'): (-3, -3)
}

优化2

结果全死光了，就剩一个复读机。

round_num: 9999
always_cooperate: 0
always_defect: 0
tit_for_tat: 1
random_choice: 0

所以得再每次淘汰的时候添加新生儿，直接copy一份就好了（无性繁殖hhh）。

if(len(players) < len(strategies) * num_players_per_strategy):
    print("Copy players")
    players_copy = players.copy()
    for player in players_copy:
        players.append(Player(player.strategy, player.score))

优化3

跑了两趟，结果还是差别挺大的，毕竟跟谁玩是完全随机的。

Round: 99
Always cooperate: 21.21212121212121%
Always defect: 31.818181818181817%
Tit for tat: 15.151515151515152%
Random choice: 31.818181818181817%

Round: 99
Always cooperate: 28.169014084507044%
Always defect: 22.535211267605636%
Tit for tat: 29.577464788732392%
Random choice: 19.718309859154928%

应该让玩家随机出生在一个位置，然后围成一个圆，只会跟附近的人进行囚徒博弈，所以只在一开始 shuffle 一下，后面就相邻的人玩。

random.shuffle(players)

for round_num in range(num_rounds):
    print("Round: " + str(round_num))

    for i in range(0, len(players)):
        play_round(players[i], (players[(i + 1) % len(players)]))

优化4

看了一眼结果，看起来好麻烦，所以再加一下图形化。

Round: 99
Always cooperate: 7
Always defect: 2
Tit for tat: 7
Random choice: 5
<function random_choice at 0x1029860c0>
<function always_defect at 0x102987100>
<function always_cooperate at 0x102986f20>
<function tit_for_tat at 0x102986e80>
<function always_cooperate at 0x102986f20>
<function tit_for_tat at 0x102986e80>
<function always_defect at 0x102987100>
<function random_choice at 0x1029860c0>
<function tit_for_tat at 0x102986e80>
<function always_cooperate at 0x102986f20>
<function always_cooperate at 0x102986f20>
<function always_cooperate at 0x102986f20>
<function always_cooperate at 0x102986f20>
<function random_choice at 0x1029860c0>
<function random_choice at 0x1029860c0>
<function tit_for_tat at 0x102986e80>
<function tit_for_tat at 0x102986e80>
<function tit_for_tat at 0x102986e80>
<function random_choice at 0x1029860c0>
<function tit_for_tat at 0x102986e80>
<function always_cooperate at 0x102986f20>