前言

今天玩太爽了,忘记要写,赶紧来补一篇。

今天给大家搞个好玩的,来让 AI 生成福尔摩斯小说,完整代码见 https://github.com/zong4/AILearning。

代码逻辑

首先要生成马尔可夫链,n_gram=2 来划分所有词(这边是两两一组),然后计算文本中从一组词转移到另一组词的概率,也就是说这样的转移在所有文本中出现的概率,如下图。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def make_markov_model(cleaned_stories, n_gram=2):
markov_model = {}
for i in range(len(cleaned_stories)-n_gram-1):
curr_state, next_state = "", ""
for j in range(n_gram):
curr_state += cleaned_stories[i+j] + " "
next_state += cleaned_stories[i+j+n_gram] + " "
curr_state = curr_state[:-1]
next_state = next_state[:-1]
if curr_state not in markov_model:
markov_model[curr_state] = {}
markov_model[curr_state][next_state] = 1
else:
if next_state in markov_model[curr_state]:
markov_model[curr_state][next_state] += 1
else:
markov_model[curr_state][next_state] = 1

# calculating transition probabilities
for curr_state, transition in markov_model.items():
total = sum(transition.values())
for state, count in transition.items():
markov_model[curr_state][state] = count/total

return markov_model

然后是根据马尔可夫链生成故事,这就和之前 PageRank 一样随机转移就行了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def generate_story(markov_model, limit=100, start='my god'):
n = 0
curr_state = start
next_state = None
story = ""
story+=curr_state+" "
while n<limit:
next_state = random.choices(list(markov_model[curr_state].keys()),
list(markov_model[curr_state].values()))

curr_state = next_state[0]
story+=curr_state+" "
n+=1
return story

for i in range(20):
print(str(i)+". ", generate_story(markov_model, start="dear holmes", limit=8))

for i in range(20):
print(str(i)+". ", generate_story(markov_model, start="my dear", limit=8))

for i in range(20):
print(str(i)+". ", generate_story(markov_model, start="i would", limit=8))

print(generate_story(markov_model, start="the case", limit=100))

结果

状态转移概率

给大家看一下 the game 后面下一组词的概率,每组词都有这样一个表。

1
2
3
All possible transitions from 'the game' state: 

{'is hardly': 0.02702702702702703, 'worth it': 0.02702702702702703, 'you are': 0.02702702702702703, 'i am': 0.02702702702702703, 'is up': 0.06306306306306306, 'now count': 0.02702702702702703, 'your letter': 0.02702702702702703, 'for all': 0.06306306306306306, 'is afoot': 0.036036036036036036, 'my own': 0.02702702702702703, 'at any': 0.02702702702702703, 'mr holmes': 0.02702702702702703, 'ay whats': 0.02702702702702703, 'my friend': 0.02702702702702703, 'fairly by': 0.02702702702702703, 'is not': 0.02702702702702703, 'was not': 0.02702702702702703, 'is and': 0.036036036036036036, 'was whist': 0.036036036036036036, 'for the': 0.036036036036036036, 'was in': 0.02702702702702703, 'may wander': 0.02702702702702703, 'now a': 0.02702702702702703, 'was up': 0.09009009009009009, 'would have': 0.036036036036036036, 'in their': 0.036036036036036036, 'in that': 0.036036036036036036, 'the lack': 0.036036036036036036, 'was afoot': 0.036036036036036036}

故事

给大家看一下用 the case 开头然后限制100组词(也就是两百个词)的故事。

1
the case was concerned with an explosive energy which told me that i was at such a sight as that the cipher macdonald sat with his hands in a paroxysm of energy and he stretched his legs as he raced past and her refusal to take her there are few in that chair i can see terror in the coal district in the bare assembly room the men he is armed with a pair precisely but this is certainly a most extraordinary fashion a letter arrived on march his death in the same way i thought of my presence then with a head that was even greater than the obtrusive emotion of the clergyman he sat for some little time no said i only meant sir that sir charles was the elder man first the younger brother and i said i could rely upon my assistance in the enclosed official report it was quite against my wishes twice my boy jack baby dolores and mrs mason to bring news of the outside briarbrae just after sunset well i think we will walk down together to our left or to a small blue book which ascends to such rarefied heights of all the

看看翻译后的,

1
这个案子与一种爆炸能量有关,它告诉我,我当时正处于这样一个景象中,即密码麦克唐纳双手坐在一阵能量中,当他飞驰而过时,他伸展双腿,而她拒绝带她,那把椅子上的人很少,我能看到恐怖,在煤炭区,在光秃秃的集会室里,他正拿着一对但这无疑是一种最不寻常的方式:一封信在三月到达,他的去世方式与我想到我的存在的方式相同,然后他的脑袋甚至比牧师的突兀情绪还要大,他坐了一小会儿,没有说,我只是说先生,查尔兹爵士首先是年长的弟弟,我说我可以依靠我在封闭官员中的帮助报告说,我的儿子杰克、宝贝、多洛雷斯和梅森夫人两次在日落后带来外面的布赖尔布雷的消息,这完全违背了我的意愿,我想我们会一起走到我们的左边,或者走到一本上升到如此罕见的蓝色小书前。

后记

最后就让我们来探讨探讨这样的文章与咱们让 ChatGPT 写出来的有什么不同。

  1. 首先很明显的第一点就是,这篇文章语法问题太多了。
  2. 然后是有点不明所以,感觉就是在乱说,缺少了故事核心。

其实说白了就是少了对语言的理解能力,但是具体的还是等我们到最后一趴讲 Language 的时候再说吧。