博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
ai人工智能程序_简单解释:一个AI程序如何掌握Go的古老游戏
阅读量:2524 次
发布时间:2019-05-11

本文共 43616 字,大约阅读时间需要 145 分钟。

ai人工智能程序

by Aman Agarwal

通过阿曼·阿加瓦尔(Aman Agarwal)

简单解释:一个AI程序如何掌握Go的古老游戏 (Explained Simply: How an AI program mastered the ancient game of Go)

This is about AlphaGo, Google DeepMind’s playing AI that shook the technology world in 2016 by defeating one of the best players in the world, .

这是关于AlphaGo的 ,这是Google DeepMind的播放AI,它在2016年击败了世界上最好的播放器之一 ,震惊了整个技术世界。

Go is an ancient board game which has so many possible moves at each step that future positions are hard to predict — and therefore it requires strong intuition and abstract thinking to play. Because of this reason, it was believed that only humans could be good at playing Go. Most researchers thought that it would still take decades to build an AI which could think like that. In fact, I’m releasing this essay today because this week (March 8–15) marks the two-year anniversary of the AlphaGo vs Sedol match!

围棋是一种古老的棋盘游戏,每一步都有很多可能的动作,因此未来的位置很难预测-因此,玩游戏需要强烈的直觉和抽象思维。 由于这个原因,人们认为只有人类才能玩Go。 大多数研究人员认为,构建像这样的AI仍需要数十年。 实际上,我今天发布这篇论文是因为本周(3月8日至15日)是AlphaGo与Sedol比赛两周年!

But AlphaGo didn’t stop there. 8 months later, it played 60 professional games on a Go website under disguise as a player named “Master”, and won every single game, against dozens of world champions, of course without resting between games.

但是AlphaGo并没有就此停止。 8个月后,它假扮成“大师”,在Go网站上进行了60场专业比赛,并且在与数十位世界冠军的对决中赢得了每一场比赛,当然,比赛之间没有休息。

Naturally this was a HUGE achievement in the field of AI and sparked worldwide discussions about whether we should be excited or worried about artificial intelligence.

自然地,这是人工智能领域的一项巨大成就,并引发了关于我们应该对人工智能感到兴奋还是担忧的全球讨论。

Today we are going to take the original research paper published by DeepMind in the Nature journal, and break it down paragraph-by-paragraph using simple English.

今天,我们将采用DeepMind在《 自然 》杂志上发表的原始研究论文,然后使用简单的英语逐段细分。

After this essay, you’ll know very clearly what AlphaGo is, and how it works. I also hope that after reading this you will not believe all the news headlines made by journalists to scare you about AI, and instead feel excited about it.

在这篇文章之后,您将非常清楚地知道AlphaGo是什么以及它是如何工作的。 我还希望阅读本文后,您不要相信新闻工作者为恐吓AI而撰写的所有新闻头条,而是会对此感到兴奋。

Worrying about the growing achievements of AI is like worrying about the growing abilities of Microsoft Powerpoint. Yes, it will get better with time with new features being added to it, but it can’t just uncontrollably grow into some kind of Hollywood monster.

担心AI的成就不断增长,就像担心Microsoft Powerpoint的能力不断增长一样。 是的,随着时间的推移,它将增加新的功能,它将变得更好,但是它不能仅仅不受控制地成长为某种好莱坞怪物。

You DON’T need to know how to play Go to understand this paper. In fact, I myself have only read the first 3–4 lines in Wikipedia’s opening paragraph about it. Instead, surprisingly, I use some examples from basic Chess to explain the algorithms. You just have to know what a 2-player board game is, in which each player takes turns and there is one winner at the end. Beyond that you don’t need to know any physics or advanced math or anything.

您不需要知道如何玩Go就可以了解本文。 实际上,我本人只阅读了Wikipedia开头段落中的前3-4行。 相反,出乎意料的是,我使用国际象棋中的一些示例来解释算法。 您只需要知道2人桌游是什么,其中每个玩家轮流玩,最后有一个赢家。 除此之外,您不需要了解任何物理知识或高级数学。

This will make it more approachable for people who only just now started learning about machine learning or neural networks. And especially for those who don’t use English as their first language (which can make it very difficult to read such papers).

对于刚开始学习机器学习或神经网络的人们来说,这将使其更易上手。 特别是对于那些不使用英语作为第一语言的人(这会使阅读此类论文非常困难)。

If you have NO prior knowledge of AI and neural networks, you can read the “Deep Learning” section of one of my previous essays . After reading that, you’ll be able to get through this essay.

如果您没有AI和神经网络的先验知识,则可以在阅读我的上一篇文章的“深度学习”部分。 阅读这些内容后,您将可以阅读这篇文章。

If you want to get a shallow understanding of Reinforcement Learning too (optional reading), you can find it .

如果您也想对强化学习有所了解(可选阅读),请找到。

Here’s the original paper if you want to try reading it:

如果您想阅读的话,这是原始论文:

As for me: Hi I’m , an AI and autonomous robots engineer. I hope that my work will save you a lot of time and effort if you were to study this on your own.

至于我:嗨,我是AI和自动机器人工程师 。 我希望我自己的工作能为您节省很多时间和精力。

Do you speak Japanese? has kindly written a brief memo about this essay in Japanese, in a .

你会说日语吗? 在 日语写了一篇简短的备忘录。

让我们开始吧! (Let’s get started!)

抽象 (Abstract)

As you know, the goal of this research was to train an AI program to play Go at the level of world-class professional human players.

如您所知,这项研究的目标是训练一个AI程序,使其能够在世界级的专业人类玩家水平上玩Go。

To understand this challenge, let me first talk about something similar done for Chess. In the early 1990s, IBM came out with the Deep Blue computer which defeated the great champion in Chess. (He’s also a very cool guy, make sure to read more about him later!) How did Deep Blue play?

为了理解这一挑战,让我首先谈一谈为国际象棋所做的类似工作。 在1990年代初期,IBM推出了Deep Blue计算机,该计算机在国际象棋中击败了伟大的冠军 。 (他也是一个很酷的人,请确保稍后再阅读有关他的更多信息!)Deep Blue的表现如何?

Well, it used a very brute force method. At each step of the game, it took a look at all the possible legal moves that could be played, and went ahead to explore each and every move to see what would happen. And it would keep exploring move after move for a while, forming a kind of HUGE decision tree of thousands of moves. And then it would come back along that tree, observing which moves seemed most likely to bring a good result. But, what do we mean by “good result”? Well, Deep Blue had many carefully designed chess strategies built into it by expert chess players to help it make better decisions — for example, how to decide whether to protect the king or get advantage somewhere else? They made a specific “evaluation algorithm” for this purpose, to compare how advantageous or disadvantageous different board positions are (IBM hard-coded expert chess strategies into this evaluation function). And finally it chooses a carefully calculated move. On the next turn, it basically goes through the whole thing again.

好吧,它使用了非常强力的方法。 在游戏的每个步骤中,它研究了可以进行的所有可能的法律动作,然后继续探索每一个动作,以查看会发生什么。 而且它会不断探索移动动作一段时间,从而形成一种由数千个动作组成的巨大决策树。 然后它沿着那棵树回来,观察哪一步似乎最有可能带来良好的结果。 但是,“好结果”是什么意思? 好吧,深蓝国际象棋棋手内置了许多精心设计的棋子策略,以帮助其做出更好的决策-例如,如何决定是保护国王还是在其他地方获得优势? 为此,他们制定了一个特定的“评估算法”,以比较不同董事会职位的有利或不利程度(将IBM硬编码专家象棋策略纳入此评估功能)。 最后,它选择了经过仔细计算的举动。 在下一轮中,它基本上会再次遍历整个过程。

As you can see, this means Deep Blue thought about millions of theoretical positions before playing each move. This was not so impressive in terms of the AI software of Deep Blue, but rather in the hardware — IBM claimed it to be one of the most powerful computers available in the market at that time. It could look at 200 million board positions per second.

正如您所看到的,这意味着Deep Blue在进行每一步之前都要考虑数百万个理论位置。 对于Deep Blue的AI软件而言,这并不是那么令人印象深刻,而是在硬件方面-IBM声称它是当时市场上功能最强大的计算机之一。 它每秒可以查看2亿个董事会职位。

Now we come to Go. Just believe me that this game is much more open-ended, and if you tried the Deep Blue strategy on Go, you wouldn’t be able to play well. There would be SO MANY positions to look at at each step that it would simply be impractical for a computer to go through that hell. For example, at the opening move in Chess there are 20 possible moves. In Go the first player has 361 possible moves, and this scope of choices stays wide throughout the game.

现在我们来吧。 只是相信我,这款游戏更加开放,如果您在Go上尝试了Deep Blue策略,您将无法发挥出色。 在每个步骤中都有很多位置需要考虑,以至于计算机无法穿越地狱是不切实际的。 例如,国际象棋的开局动作有20种可能的动作。 在Go中,第一个玩家有361个可能的动作,并且此选择范围在整个游戏中仍然存在。

This is what they mean by “enormous search space.” Moreover, in Go, it’s not so easy to judge how advantageous or disadvantageous a particular board position is at any specific point in the game — you kinda have to play the whole game for a while before you can determine who is winning. But let’s say you magically had a way to do both of these. And that’s where deep learning comes in!

这就是“巨大的搜索空间”的意思。 此外,在Go中,要判断一个特定的棋盘位置在游戏中的任何特定位置有利还是不利并不容易-在确定谁是赢家之前,您必须玩整个游戏一段时间。 但是,假设您魔术般地做到了这两种方式。 这就是深度学习的用武之地!

So in this research, DeepMind used neural networks to do both of these tasks (if you have never read about neural networks yet, ). They trained a “policy neural network” to decide which are the most sensible moves in a particular board position (so it’s like following an intuitive strategy to pick moves from any position). And they trained a “value neural network” to estimate how advantageous a particular board arrangement is for the player (or in other words, how likely you are to win the game from this position). They trained these neural networks first with human game examples (your good old ordinary supervised learning). After this the AI was able to mimic human playing to a certain degree, so it acted like a weak human player. And then to train the networks even further, they made the AI play against itself millions of times (this is the “reinforcement learning” part). With this, the AI got better because it had more practice.

因此,在这项研究中,DeepMind使用神经网络来完成这两项任务(如果您还没有阅读过有关神经网络的信息,请 )。 他们训练了一个“政策神经网络”,以决定在特定董事会位置上最明智的举动(因此,就像遵循一种直观的策略从任何位置挑选举动一样)。 他们训练了“价值神经网络”,以评估特定棋盘布局对玩家的有利程度(换句话说,从该位置赢得游戏的可能性有多大)。 他们首先通过人类游戏示例(您的优秀老普通监督学习)训练了这些神经网络。 此后,AI可以在一定程度上模仿人类的游戏,因此它表现得像一个弱小的人类玩家。 然后,为了进一步培训网络,他们使AI与自身竞争了数百万次(这是“强化学习”部分)。 有了这个,人工智能就变得更好了,因为它有更多的实践。

With these two networks alone, DeepMind’s AI was able to play well against state-of-the-art Go playing programs that other researchers had built before. These other programs had used an already popular pre-existing game playing algorithm, called the “Monte Carlo Tree Search” (MCTS). More about this later.

仅凭这两个网络,DeepMind的AI就能与其他研究人员之前构建的最新Go播放程序很好地兼容。 这些其他程序使用了已经流行的预先存在的游戏算法,称为“蒙特卡洛树搜索”(MCTS)。 稍后再详细介绍。

But guess what, we still haven’t talked about the real deal. DeepMind’s AI isn’t just about the policy and value networks. It doesn’t use these two networks as a replacement of the Monte Carlo Tree Search. Instead, it uses the neural networks to make the MCTS algorithm work better… and it got so much better that it reached superhuman levels. THIS improved variation of MCTS is “AlphaGo”, the AI that beat Lee Sedol and went down in AI history as one of the greatest breakthroughs ever. So essentially, AlphaGo is simply an improved implementation of a very ordinary computer science algorithm. Do you understand now why AI in its current form is absolutely nothing to be scared of?

但是你猜怎么着,我们还没有谈论真正的交易。 DeepMind的AI不仅涉及政策和价值网络。 它不会使用这两个网络来代替 “蒙特卡洛树搜索”。 取而代之的是,它使用神经网络使MCTS算法更好地工作……它变得更好,甚至达到了超人的水平。 MCTS的改进版本是“ AlphaGo”,它击败了李·塞多尔(Lee Sedol),并跌入了人工智能历史,成为有史以来最大的突破之一。 因此,从本质上讲,AlphaGo只是对非常普通的计算机科学算法的改进实现 。 您现在是否了解为什么现在绝对惧怕当前形式的AI?

Wow, we’ve spent a lot of time on the Abstract alone.

哇,我们仅在摘要上就花了很多时间。

Alright — to understand the paper from this point on, first we’ll talk about a gaming strategy called the Monte Carlo Tree Search algorithm. For now, I’ll just explain this algorithm at enough depth to make sense of this essay. But if you want to learn about it in depth, some smart people have also made excellent videos and blog posts on this:

好的-从现在开始理解本文,首先我们将讨论一种称为蒙特卡洛树搜索算法的游戏策略。 现在,我将仅以足够的深度来解释该算法以使本文有意义。 但是,如果您想深入了解它,一些聪明的人也会在此方面制作出色的视频和博客文章:

1. 2. 3.

1. 2. 3.

The following section is long, but easy to understand (I’ll try my best) and VERY important, so stay with me! The rest of the essay will go much quicker.

下一节很长,但很容易理解(我会尽力而为),并且非常重要,请与我在一起! 本文的其余部分将更快。

Let’s talk about the first paragraph of the essay above. Remember what I said about Deep Blue making a huge tree of millions of board positions and moves at each step of the game? You had to do simulations and look at and compare each and every possible move. As I said before, that was a simple approach and very straightforward approach — if the average software engineer had to design a game playing AI, and had all the strongest computers of the world, he or she would probably design a similar solution.

让我们谈谈以上文章的第一段。 还记得我所说的《深蓝》在游戏的每个步骤中都制作出数以百万计的棋盘位置和动作的巨树吗? 您必须进行模拟,并查看和比较每个可能的动作。 如我之前所说,这是一种简单且非常简单的方法-如果一般的软件工程师必须设计一款玩AI的游戏,并且拥有世界上所有最强大的计算机,那么他(或她)可能会设计一个类似的解决方案。

But let’s think about how do humans themselves play chess? Let’s say you’re at a particular board position in the middle of the game. By game rules, you can do a dozen different things — move this pawn here, move the queen two squares here or three squares there, and so on. But do you really make a list of all the possible moves you can make with all your pieces, and then select one move from this long list? No — you “intuitively” narrow down to a few key moves (let’s say you come up with 3 sensible moves) that you think make sense, and then you wonder what will happen in the game if you chose one of these 3 moves. You might spend 15–20 seconds considering each of these 3 moves and their future — and note that during these 15 seconds you don’t have to carefully plan out the future of each move; you can just “roll out” a few mental moves guided by your intuition without TOO much careful thought (well, a good player would think farther and more deeply than an average player). This is because you have limited time, and you can’t accurately predict what your opponent will do at each step in that lovely future you’re cooking up in your brain. So you’ll just have to let your gut feeling guide you. I’ll refer to this part of the thinking process as “rollout”, so take note of it!So after “rolling out” your few sensible moves, you finally say screw it and just play the move you find best.

但是,让我们考虑一下人类自己如何下棋吗? 假设您在游戏中途处于特定的棋盘位置。 根据游戏规则,您可以做很多不同的事情-将该棋子移到此处,将女王/王后移到这里两个正方形或那里移到三个正方形,依此类推。 但是,您是否真的列出了所有棋子可能做出的所有举动的清单,然后从这一长清单中选择一个举动? 不,您“直觉”地缩小了一些您认为有意义的关键动作(假设您提出了3个明智的动作),然后您想知道如果您从这3个动作中选择一个,游戏中将会发生什么。 您可能需要花15到20秒来考虑这3个动作中的每个动作及其未来-并请注意,在这15秒钟内,您不必仔细计划每个动作的未来; 您可以根据自己的直觉“展开”一些思维上的动作,而无需太仔细地思考(嗯,一个好的玩家会比普通玩家思考得更远,更深入)。 这是因为您的时间有限, 并且您无法准确预测对手在大脑中蒸蒸日上的美好未来的每一步。 因此,您只需要让您的直觉引导您即可。 我将思维过程的这一部分称为“推出”,因此请注意!因此,在“推出”您的一些明智的举动之后,您最终会说出要拧的脚,然后尽自己所能找到最好的举动。

Then the opponent makes a move. It might be a move you had already well anticipated, which means you are now pretty confident about what you need to do next. You don’t have to spend too much time on the rollouts again. OR, it could be that your opponent hits you with a pretty cool move that you had not expected, so you have to be even more careful with your next move.This is how the game carries on, and as it gets closer and closer to the finishing point, it would get easier for you to predict the outcome of your moves — so your rollouts don’t take as much time.

然后,对手采取行动。 这可能是您早已预料到的举动,这意味着您现在对接下来需要做什么很有信心。 您不必花太多时间在发布上。 或者,可能是您的对手以您未曾预料到的非常酷的动作击中了您,因此您在接下来的动作中要格外小心,这就是游戏进行的方式,并且随着游戏的进行越来越近达到终点时,您将更容易预测移动的结果-因此,部署不需要花费太多时间。

The purpose of this long story is to describe what the MCTS algorithm does on a superficial level — it mimics the above thinking process by building a “search tree” of moves and positions every time. Again, for more details you should check out the links I mentioned earlier. The innovation here is that instead of going through all the possible moves at each position (which Deep Blue did), it instead intelligently selects a small set of sensible moves and explores those instead. To explore them, it “rolls out” the future of each of these moves and compares them based on their imagined outcomes.(Seriously — this is all I think you need to understand this essay)

这个漫长故事的目的是从表面上描述MCTS算法的作用-它通过每次构建移动和位置的“搜索树”来模仿上述思考过程。 同样,有关更多详细信息,您应该查看我前面提到的链接。 这里的创新之处在于,它无需在每个位置上都进行所有可能的动作(深蓝色所做的),而是智能地选择了一组明智的动作并进行了探索。 为了探索它们,它“展示”了这些动作中的每一个的未来,并根据其想象的结果对其进行了比较。(严重的是,这就是我认为您需要理解的所有文章)

Now — coming back to the screenshot from the paper. Go is a “” (please read the definition in the link, don’t worry it’s not scary). And theoretically, for such games, no matter which particular position you are at in the game (even if you have just played 1–2 moves), it is possible that you can correctly guess who will win or lose (assuming that both players play “perfectly” from that point on). I have no idea who came up with this theory, but it is a fundamental assumption in this research project and it works.

现在-返回本文的屏幕截图。 Go是一个“ ”(请阅读链接中的定义,不要担心它并不可怕)。 从理论上讲 ,对于此类游戏,无论您在游戏中处于哪个特定位置(即使您刚刚打过1-2步),都有可能正确猜出谁会赢或输(假设两个玩家都玩)从那时起“完美”)。 我不知道是谁提出了这个理论,但这是本研究项目中的一个基本假设,并且可行。

So that means, given a state of the game s, there is a function v*(s) which can predict the outcome, let’s say probability of you winning this game, from 0 to 1. They call it the “optimal value function”. Because some board positions are more likely to result in you winning than other board positions, they can be considered more “valuable” than the others. Let me say it again: Value = Probability between 0 and 1 of you winning the game.

因此,这意味着在给定游戏状态s的情况下 ,有一个函数v *(s)可以预测结果,比方说您赢得游戏的概率从0到1。它们被称为“最优值函数” 。 由于某些董事会职位比其他董事会职位更可能导致您获胜,因此可以认为它们比其他职位更“有价值”。 让我再说一遍:值=您赢得游戏的0到1之间的概率。

But wait — say there was a girl named Foma sitting next to you while you play Chess, and she keeps telling you at each step if you’re winning or losing. “You’re winning… You’re losing… Nope, still losing…” I think it wouldn’t help you much in choosing which move you need to make. She would also be quite annoying. What would instead help you is if you drew the whole tree of all the possible moves you can make, and the states that those moves would lead to — and then Foma would tell you for the entire tree which states are winning states and which states are losing states. Then you can choose moves which will keep leading you to winning states. All of a sudden Foma is your partner in crime, not an annoying friend. Here, Foma behaves as your optimal value function v*(s). Earlier, it was believed that it’s not possible to have an accurate value function like Foma for the game of Go, because the games had so much uncertainty.

但是,等等-假设在您玩国际象棋时有一个名叫Foma的女孩坐在您旁边,并且她会在每一步都告诉您是输还是输。 “你赢了……你输了……不,还是输了……”我认为这对选择你需要采取的行动没有多大帮助。 她也很烦。 相反,对您有帮助的是,如果您绘制了所有可能动作的整棵树,以及这些动作将导致的状态,然后Foma会告诉您整棵树中哪些州是获胜州,哪些州是获胜州。失去状态。 然后,您可以选择将继续引领您进入获胜状态的举动。 突然之间,福马是您犯罪的伴侣,而不是一个令人讨厌的朋友。 在此,Foma表现为最佳值函数v *(s)。 此前,人们认为在Go游戏中不可能拥有像Foma这样的准确价值函数,因为这些游戏具有很大的不确定性。

BUT — even if you had the wonderful Foma, this wonderland strategy of drawing out all the possible positions for Foma to evaluate will not work very well in the real world. In a game like Chess or Go, as we said before, if you try to imagine even 7–8 moves into the future, there can be so many possible positions that you don’t have enough time to check all of them with Foma.

但是,即使您拥有出色的Foma,这种为Foma评估所有可能位置进行评估的奇幻世界策略在现实世界中也无法很好地发挥作用。 正如我们之前所说,在象棋或围棋这样的游戏中,如果您试图想象甚至有7–8步入未来,那么可能会有太多可能的位置,因此您没有足够的时间用Foma来检查所有位置。

So Foma is not enough. You need to narrow down the list of moves to a few sensible moves that you can roll out into the future. How will your program do that? Enter Lusha. Lusha is a skilled Chess player and enthusiast who has spent decades watching grand masters play Chess against each other. She can look at your board position, look quickly at all the available moves you can make, and tell you how likely it would be that a Chess expert would make any of those moves if they were sitting at your table. So if you have 50 possible moves at a point, Lusha will tell you the probability that each move would be picked by an expert. Of course, a few sensible moves will have a much higher probability and other pointless moves will have very little probability. For example: if in Chess, let’s say your Queen is in danger in one corner of the game, you might still have the option to move a little pawn in another corner of the game She is your policy function, p(a\s). For a given state s, she can give you probabilities for all the possible moves that an expert would make.

所以Foma还不够。 您需要将举动列表缩小到一些明智的举动,以供将来使用。 您的程序将如何做到这一点? 输入Lusha。 卢莎(Lusha)是一位熟练的国际象棋棋手和发烧友,他花了数十年的时间观看大师们相互对弈。 她可以查看您的董事会位置,快速查看您可以进行的所有可用动作,并告诉您,如果国际象棋专家坐在您的桌子旁,那么他们做出任何这些动作的可能性有多大。 因此,如果您在某个时刻有50个可能的举动,Lusha会告诉您专家会选择每个举动的可能性。 当然,一些明智的举动将具有更高的概率,而其他毫无意义的举动将具有很小的概率。 例如:如果在国际象棋中,假设您的女王在游戏的一个角落处于危险中,那么您可能仍然可以选择在游戏的另一个角落移动一个小兵。她是您的策略函数 ,p(a \ s) 。 对于给定的状态,她可以为您提供专家可能采取的所有可能动作的概率。

Wow — you can take Lusha’s help to guide you in how to select a few sensible moves, and Foma will tell you the likelihood of winning from each of those moves. You can choose the move that both Foma and Lusha approve. Or, if you want to be extra careful, you can roll out the moves selected by Lusha, have Foma evaluate them, pick a few of them to roll out further into the future, and keep letting Foma and Lusha help you predict VERY far into the game’s future — much quicker and more efficient than to go through all the moves at each step into the future. THIS is what they mean by “reducing the search space”. Use a value function (Foma) to predict outcomes, and use a policy function (Lusha) to give you grand-master probabilities to help narrow down the moves you roll out. These are called “Monte Carlo rollouts”. Then while you backtrack from future to present, you can take average values of all the different moves you rolled out, and pick the most suitable action. So far, this has only worked on a weak amateur level in Go, because the policy functions and value functions that they used to guide these rollouts weren’t that great.

哇-您可以借助Lusha的帮助来指导您如何选择一些明智的举动,Foma会告诉您从这些举动中获胜的可能性。 您可以选择Foma和Lusha都批准的举动。 或者,如果您要格外小心,则可以推出Lusha选择的举动,让Foma对其进行评估,从中选择一些举动以进一步推广到未来,并让Foma和Lusha帮助您预测非常游戏的未来-比在进入未来的每一步中都走得更快,更高效。 这就是他们“缩小搜索空间”的意思。 使用价值函数(Foma)来预测结果,并使用策略函数(Lusha)为您提供大师级的概率,以帮助缩小所采取的措施。 这些称为“蒙特卡罗推广”。 然后,当您从将来追溯到现在时,您可以获取所推出的所有不同动作的平均值,并选择最合适的动作。 到目前为止,这仅在Go的业余爱好者水平上起作用,因为用于指导这些部署的策略功能和价值功能并不是那么好。

Phew.

ew

The first line is self explanatory. In MCTS, you can start with an unskilled Foma and unskilled Lusha. The more you play, the better they get at predicting solid outcomes and moves. “Narrowing the search to a beam of high probability actions” is just a sophisticated way of saying, “Lusha helps you narrow down the moves you need to roll out by assigning them probabilities that an expert would play them”. Prior work has used this technique to achieve strong amateur level AI players, even with simple (or “shallow” as they call it) policy functions.

第一行是不言自明的。 在MCTS中,您可以从不熟练的Foma和不熟练的Lusha开始。 您玩的越多,他们就越能预测可靠的结果和动作。 “将搜索范围缩小到一连串高概率动作”只是一种复杂的说法,“ Lusha通过分配专家可以扮演的概率来帮助您缩小需要推出的动作的范围”。 先前的工作已经使用此技术来实现强大的业余级别的AI玩家,即使具有简单(或称其为“浅”)策略功能。

Yeah, convolutional neural networks are great for image processing. And since a neural network takes a particular input and gives an output, it is essentially a function, right? So you can use a neural network to become a complex function. So you can just pass in an image of the board position and let the neural network figure out by itself what’s going on. This means it’s possible to create neural networks which will behave like VERY accurate policy and value functions. The rest is pretty self explanatory.

是的,卷积神经网络非常适合图像处理。 而且由于神经网络接受特定的输入并给出输出,因此它本质上是一个函数,对吗? 因此,您可以使用神经网络来成为复杂的功能。 因此,您只需传递电路板位置的图像,然后让神经网络自己弄清楚发生了什么。 这意味着有可能创建神经网络,其行为将非常像精确的政策和价值函数。 其余的很容易解释。

Here we discuss how Foma and Lusha were trained. To train the policy network (predicting for a given position which moves experts would pick), you simply use examples of human games and use them as data for good old supervised learning.

在这里,我们讨论如何对Foma和Lusha进行培训。 要训​​练政策网络(预测专家会选择的给定职位),您只需使用人类游戏的示例,并将其用作良好的旧有监督学习的数据。

And you want to train another slightly different version of this policy network to use for rollouts; this one will be smaller and faster. Let’s just say that since Lusha is so experienced, she takes some time to process each position. She’s good to start the narrowing-down process with, but if you try to make her repeat the process , she’ll still take a little too much time. So you train a *faster policy network* for the rollout process (I’ll call it… Lusha’s younger brother Jerry? I know I know, enough with these names). After that, once you’ve trained both of the slow and fast policy networks enough using human player data, you can try letting Lusha play against herself on a Go board for a few days, and get more practice. This is the reinforcement learning part — making a better version of the policy network.

您想培训该策略网络的另一个版本,以用于推广; 这个会更小,更快。 可以说,由于Lusha经验丰富,她需要一些时间来处理每个职位。 她很乐意开始缩小范围的过程,但是如果您尝试让她重复该过程,她仍然会花费太多时间。 因此,您为推出过程训练了一个“更快的策略网络”(我称它为……卢莎的弟弟杰里?我知道,我知道这些名字就足够了)。 之后,一旦您使用人类玩家数据对慢速和快速策略网络都进行了足够的培训,就可以尝试让Lusha在Go板上与自己对战几天,然后进行更多练习。 这是强化学习的一部分-制定更好的政策网络版本。

Then, you train Foma for value prediction: determining the probability of you winning. You let the AI practice through playing itself again and again in a simulated environment, observe the end result each time, and learn from its mistakes to get better and better.

然后,您训练Foma进行价值预测:确定获胜的可能性。 您可以让AI在一个模拟的环境中一次又一次地练习,每次观察最终结果,并从错误中学习,从而变得越来越好。

I won’t go into details of how these networks are trained. You can read more technical details in the later section of the paper (‘Methods’) which I haven’t covered here. In fact, the real purpose of this particular paper is not to show how they used reinforcement learning on these neural networks. One of DeepMind’s previous papers, in which they taught AI to play ATARI games, has already discussed some reinforcement learning techniques in depth (And I’ve already written an explanation of that paper ). For this paper, as I lightly mentioned in the Abstract and also underlined in the screenshot above, the biggest innovation was the fact that they used RL with neural networks for improving an already popular game-playing algorithm, MCTS. RL is a cool tool in a toolbox that they used to fine-tune the policy and value function neural networks after the regular supervised training. This research paper is about proving how versatile and excellent this tool it is, not about teaching you how to use it. In television lingo, the Atari paper was a RL infomercial and this AlphaGo paper is a commercial.

我不会详细介绍如何训练这些网络。 您可以在本文的后面部分(“方法”)中阅读更多技术细节,这里没有介绍。 实际上,这篇特定论文的真正目的并不是要展示他们如何在这些神经网络上使用强化学习。 DeepMind之前的一篇论文(其中他们教AI玩ATARI游戏)已经深入讨论了一些强化学习技术(并且我已经写了一篇说明性文章)。 就本文而言,正如我在摘要中稍加提及并在上面的屏幕快照中所强调的那样,最大的创新是他们将RL与神经网络结合使用,以改善已经流行的游戏算法MCTS。 RL是工具箱中的一个很酷的工具,在进行了定期的有监督的培训之后,他们曾使用RL来微调政策和价值函数神经网络。 这份研究论文的目的是证明该工具的多功能性和出色性,而不是教您如何使用它。 在电视术语中,Atari论文是RL商业广告,而AlphaGo论文是商业广告。

好了,我们终于完成了“简介”部分。 到目前为止,您已经对AlphaGo的功能有了很好的了解。 (Alright we’re finally done with the “introduction” parts. By now you already have a very good feel for what AlphaGo was all about.)

接下来,我们将更深入地讨论上面讨论的每件事。 您可能会看到一些难看且危险的数学方程式和表达式,但它们很简单(我将全部解释)。 放松。 (Next, we’ll go slightly deeper into each thing we discussed above. You might see some ugly and dangerous looking mathematical equations and expressions, but they’re simple (I explain them all). Relax.)

A quick note before you move on. Would you like to help me write more such essays explaining cool research papers? If you’re serious, I’d be glad to work with you. Please leave a comment and I’ll get in touch with you.

继续之前的快速注释。 您想帮我写更多这样的文章来解释很酷的研究论文吗? 如果您是认真的人,我很乐意与您合作。 请发表评论,我会与您联系。

So, the first step is in training our policy NN (Lusha), to predict which moves are likely to be played by an expert. This NN’s goal is to allow the AI to play similar to an expert human. This is a convolutional neural network (as I mentioned before, it’s a special kind of NN that is very useful in image processing) that takes in a simplified image of a board arrangement. “Rectifier nonlinearities” are layers that can be added to the network’s architecture. They give it the ability to learn more complex things. If you’ve ever trained NNs before, you might have used the “ReLU” layer. That’s what these are.

因此,第一步是训练我们的政策神经网络(Lusha),以预测专家可能采取哪些动作。 这个NN的目标是让AI扮演类似于专家的角色。 这是一个卷积神经网络(正如我之前提到的,它是一种特殊的NN,在图像处理中非常有用),它可以简化电路板布置的图像。 “整流器非线性”是可以添加到网络体系结构中的层。 他们赋予它学习更复杂事物的能力。 如果您曾经培训过NN,则可能使用了“ ReLU”层。 这就是这些。

The training data here was in the form of random pairs of board positions, and the labels were the actions chosen by humans when they were in those positions. Just regular supervised learning.

这里的训练数据是成对的董事会职位,形式是随机的,标签是人类在这些职位上选择的动作。 只是定期的监督学习。

Here they use “stochastic gradient ASCENT”. Well, this is an algorithm for backpropagation. Here, you’re trying to maximise a reward function. And the reward function is just the probability of the action predicted by a human expert; you want to increase this probability. But hey — you don’t really need to think too much about this. Normally you train the network so that it minimises a loss function, which is essentially the error/difference between predicted outcome and actual label. That is called gradient DESCENT. In the actual implementation of this research paper, they have indeed used the regular gradient descent. You can easily find a loss function that behaves opposite to the reward function such that minimising this loss will maximise the reward.

在这里,他们使用“随机梯度上升”。 好吧,这是用于反向传播的算法。 在这里,您正在尝试最大化奖励功能。 奖励函数就是人类专家预测动作的概率。 您想增加这种可能性。 但是,嘿-您实际上不需要对此进行过多考虑。 通常,您对网络进行训练,以使其最小化损失函数,这实际上是预测结果与实际标签之间的误差/差异。 那就是所谓的梯度DESCENT。 在本研究论文的实际实施中,他们确实使用了规则的梯度下降 。 您可以轻松地找到与奖励函数相反的损失函数,以便将损失最小化将使奖励最大化。

The policy network has 13 layers, and is called “SL policy” network (SL = supervised learning). The data came from a… I’ll just say it’s a popular website on which millions of people play Go. How good did this SL policy network perform?

策略网络有13层,称为“ SL策略”网络(SL =监督学习)。 数据来自……我只是说这是一个受欢迎的网站,成千上万的人在其中玩Go。 该SL政策网络的绩效如何?

It was more accurate than what other researchers had done earlier. The rest of the paragraph is quite self-explanatory. As for the “rollout policy”, you do remember from a few paragraphs ago, how Lusha the SL policy network is slow so it can’t integrate well with the MCTS algorithm? And we trained another faster version of Lusha called Jerry who was her younger brother? Well, this refers to Jerry right here. As you can see, Jerry is just half as accurate as Lusha BUT it’s thousands of times faster! It will really help get through rolled out simulations of the future faster, when we apply the MCTS.

它比其他研究人员之前所做的更为准确。 该段的其余部分很不言自明。 关于“推出策略”,您确实记得前几段内容,Lusha SL策略网络运行缓慢,因此不能与MCTS算法很好地集成吗? 我们训练了另一个更快的Lusha版本,叫做Jerry,她是她的弟弟? 好吧,这里指的是杰里。 如您所见,Jerry的准确度仅为Lusha BUT的一半,但速度却快了数千倍! 当我们应用MCTS时,它将确实有助于更快地完成对未来的模拟。

For this next section, you don’t *have* to know about Reinforcement Learning already, but then you’ll have to assume that whatever I say works. If you really want to dig into details and make sure of everything, you might want to read a little about RL first.

在下一节中,您没有*已经*知道强化学习,但是接下来您将不得不假设我所说的一切都是可行的。 如果您真的想深入研究细节并确定所有内容,则可能需要先阅读一些有关RL的知识。

Once you have the SL network, trained in a supervised manner using human player moves with the human moves data, as I said before you have to let her practice by itself and get better. That’s what we’re doing here. So you just take the SL policy network, save it in a file, and make another copy of it.

一旦有了SL网络,就可以使用人类动作数据和人类动作数据进行监督训练,就像我之前说过的那样,您必须让她自己练习并变得更好。 这就是我们在这里所做的。 因此,您只需使用SL策略网络,将其保存在文件中,然后再对其进行复制。

Then you use reinforcement learning to fine-tune it. Here, you make the network play against itself and learn from the outcomes.

然后,您可以使用强化学习对其进行微调。 在这里,您可以使网络与自身竞争并从结果中学习。

But there’s a problem in this training style.

但是这种训练方式存在一个问题。

If you only forever practice against ONE opponent, and that opponent is also only practicing with you exclusively, there’s not much of new learning you can do. You’ll just be training to practice how to beat THAT ONE player. This is, you guessed it, overfitting: your techniques play well against one opponent, but don’t generalize well to other opponents. So how do you fix this?

如果您仅与一个对手永远练习,而该对手也仅与您一起练习,那么您将无法进行很多新的学习。 您将只是在训练中练习如何击败一位玩家。 您猜对了,这太过适合了:您的技术在与一个对手的比赛中表现不错,但在其他对手中的表现却不佳。 那么如何解决这个问题?

Well, every time you fine-tune a neural network, it becomes a slightly different kind of player. So you can save this version of the neural network in a list of “players”, who all behave slightly differently right? Great — now while training the neural network, you can randomly make it play against many different older and newer versions of the opponent, chosen from that list. They are versions of the same player, but they all play slightly differently. And the more you train, the MORE players you get to train even more with! Bingo!

好吧,每次您微调神经网络时,它都会变成一种稍有不同的播放器。 因此,您可以将此神经网络版本保存在“玩家”列表中,这些玩家的行为略有不同,对吗? 太好了-现在,在训练神经网络时,您可以随机地使它与从该列表中选择的许多不同版本的对手进行比赛。 它们是同一播放器的版本,但是它们的播放方式略有不同。 而且您训练的越多,就可以训练更多的球员! 答对了!

In this training, the only thing guiding the training process is the ultimate goal, i.e winning or losing. You don’t need to specially train the network to do things like capture more area on the board etc. You just give it all the possible legal moves it can choose from, and say, “you have to win”. And this is why RL is so versatile; it can be used to train policy or value networks for any game, not just Go.

在本次培训中,唯一指导培训过程的是最终目标,即获胜或失败。 您不需要专门培训网络就可以进行诸如在董事会上占据更多区域之类的事情。您只需将其可能选择的所有可能的法律举动给予它,然后说“您必须取胜”。 这就是RL如此通用的原因。 它可以用于训练任何游戏的策略或价值网络,而不仅仅是Go。

Here, they tested how accurate this RL policy network was, just by itself without any MCTS algorithm. As you would remember, this network can directly take a board position and decide how an expert would play it — so you can use it to single-handedly play games.Well, the result was that the RL fine-tuned network won against the SL network that was only trained on human moves. It also won against other strong Go playing programs.

在这里,他们测试了此RL政策网络的准确性,仅凭其本身就没有任何MCTS算法。 您会记得,该网络可以直接担任董事会职位,并决定专家的游戏方式-因此您可以使用它来单手玩游戏。结果是,RL精细调整的网络赢得了SL的胜利仅接受人类动作训练的网络。 它也击败了其他强大的围棋比赛程序。

Must note here that even before training this RL policy network, the SL policy network was already better than the state of the art — and now, it has further improved! And we haven’t even come to the other parts of the process like the value network.

在这里必须注意, 即使在训练此RL策略网络之前,SL策略网络也已经比最新技术更好-并且现在,它得到了进一步的改善 ! 而且,我们甚至还没有涉及价值网络等流程的其他部分。

Did you know that baby penguins can sneeze louder than a dog can bark? Actually that’s not true, but I thought you’d like a little joke here to distract from the scary-looking equations above. Coming to the essay again: we’re done training Lusha here. Now back to Foma — remember the “optimal value function”: v*(s) -> that only tells you how likely you are to win in your current board position if both players play perfectly from that point on?So obviously, to train an NN to become our value function, we would need a perfect player… which we don’t have. So we just use our strongest player, which happens to be our RL policy network.

您是否知道小企鹅打喷嚏的声音比狗吠叫的声音大? 其实这是不对的,但我想您在这里想开个玩笑来转移上面那令人毛骨悚然的方程式的注意力。 再次回到本文:我们在这里完成了对Lusha的培训。 现在回到Foma -记住“最优值函数”:v *(s)->仅告诉您如果两个玩家都从那时开始完美玩耍,您在当前的棋盘位置获胜的可能性是多少? NN成为我们的价值功能,我们需要一个完美的球员……这是我们所没有的。 因此,我们只是使用我们最强大的平台 ,而这恰恰是我们的RL政策网络。

It takes the current state board state s, and outputs the probability that you will win the game. You play a game and get to know the outcome (win or loss). Each of the game states act as a data sample, and the outcome of that game acts as the label. So by playing a 50-move game, you have 50 data samples for value prediction.

它采用当前的状态板状态s,并输出您将赢得比赛的概率。 您玩游戏并了解结果(胜利或失败)。 每个游戏状态都充当数据样本,而该游戏的结果则充当标签。 因此,通过玩50步游戏,您将拥有50个用于价值预测的数据样本。

Lol, no. This approach is naive. You can’t use all 50 moves from the game and add them to the dataset.

哈哈,不。 这种方法很幼稚。 您不能使用游戏中的所有50个动作并将它们添加到数据集中。

The training data set had to be chosen carefully to avoid overfitting. Each move in the game is very similar to the next one, because you only move once and that gives you a new position, right? If you take the states at all 50 of those moves and add them to the training data with the same label, you basically have lots of “kinda duplicate” data, and that causes overfitting. To prevent this, you choose only very distinct-looking game states. So for example, instead of all 50 moves of a game, you only choose 5 of them and add them to the training set. DeepMind took 30 million positions from 30 million different games, to reduce any chances of there being duplicate data. And it worked!

必须仔细选择培训数据集,以免过度拟合。 游戏中的每一步都与下一步非常相似,因为您只移动一次,这就会给您一个新的位置,对吗? 如果您将所有这50个动作的状态都取下来,并用相同的标签将它们添加到训练数据中,则基本上会有大量的“重复数据”数据,这会导致过拟合。 为避免这种情况,您只能选择外观非常独特的游戏状态。 因此,例如,您只选择其中5个,而不是将游戏中的全部50个动作添加到训练集中即可。 DeepMind从3000万种不同的游戏中获得了3000万个职位,以减少重复数据的可能性。 而且有效!

Now, something conceptual here: there are two ways to evaluate the value of a board position. One option is a magical optimal value function (like the one you trained above). The other option is to simply roll out into the future using your current policy (Lusha) and look at the final outcome in this roll out. Obviously, the real game would rarely go by your plans. But DeepMind compared how both of these options do. You can also do a mixture of both these options. We will learn about this “mixing parameter” a little bit later, so make a mental note of this concept!

现在,这里一些概念性的东西 :有两种方法可以评估董事会职位的价值。 一种选择是神奇的最优值函数(就像上面训练过的那样)。 另一种选择是使用您当前的政策(Lusha)简单地将其推广到未来,并查看此次推广的最终结果。 显然,真正的游戏很少会按您的计划进行。 但是DeepMind比较了这两种选择的作用。 您也可以同时使用这两个选项。 我们稍后将学习此“混合参数”,因此请牢记此概念!

Well, your single neural network trying to approximate the optimal value function is EVEN BETTER than doing thousands of mental simulations using a rollout policy! Foma really kicked ass here. When they replaced the fast rollout policy with the twice-as-accurate (but slow) RL policy Lusha, and did thousands of simulations with that, it did better than Foma. But only slightly better, and too slowly. So Foma is the winner of this competition, she has proved that she can’t be replaced.

好吧,您的单个神经网络试图逼近最优值函数比使用推广策略进行数千次心理模拟还要好! 福玛真的踢屁股了。 当他们用两倍准确(但缓慢)的RL策略Lusha代替快速推出策略时,并用它进行了数千次仿真, 结果比Foma还要好。 但是只有稍微好一点,而且太慢了。 因此,福玛(Foma)是本届比赛的获胜者,她证明了自己不能被取代。

Now that we have trained the policy and value functions, we can combine them with MCTS and give birth to our former world champion, destroyer of grand masters, the breakthrough of a generation, weighing two hundred and sixty eight pounds, one and only Alphaaaaa GO!

现在我们已经训练了政策和价值功能,我们可以将它们与MCTS结合起来,从而诞生我们的前世界冠军,大师级驱逐舰,一代人的突破,重达168磅,只有一个Alphaaaaa GO !

In this section, ideally you should have a slightly deeper understanding of the inner workings of the MCTS algorithm, but what you have learned so far should be enough to give you a good feel for what’s going on here. The only thing you should note is how we’re using the policy probabilities and value estimations. We combine them during roll outs, to narrow down the number of moves we want to roll out at each step. Q(s,a) represents the value function, and u(s,a) is a stored probability for that position. I’ll explain.

在本节中,理想情况下,您应该对MCTS算法的内部工作有一个更深的了解,但是到目前为止,您所学到的知识应该足以使您对这里发生的事情有很好的了解。 您唯一需要注意的是我们如何使用策略概率和价值估算。 我们在推出期间将它们结合在一起,以缩小我们希望在每个步骤中推出的移动数量。 Q(s,a)表示值函数,而u(s,a)是该位置的存储概率。 我会解释。

Remember that the policy network uses supervised learning to predict expert moves? And it doesn’t just give you most likely move, but rather gives you probabilities for each possible move that tell how likely it is to be an expert move. This probability can be stored for each of those actions. Here they call it “prior probability”, and they obviously use it while selecting which actions to explore. So basically, to decide whether or not to explore a particular move, you consider two things: First, by playing this move, how likely are you to win? Yes, we already have our “value network” to answer this first question. And the second question is, how likely is it that an expert would choose this move? (If a move is super unlikely to be chosen by an expert, why even waste time considering it. This we get from the policy network)

还记得政策网络使用监督学习来预测专家的举动吗? 它不仅为您提供了最有可能的举动,而且还为您提供了每种可能举动的概率 ,这些概率表明了专家举动的可能性。 可以为每个动作存储该概率。 在这里,他们将其称为“先验概率”,并且显然在选择要探索的动作时会使用它。 因此,基本上,要决定是否探索某个特定的举动,您需要考虑两件事:首先,通过玩此举,您获胜的可能性有多大? 是的,我们已经有了“价值网络”来回答第一个问题。 第二个问题是,专家选择此举的可能性有多大? (如果专家极不可能选择此举,那为什么还要浪费时间考虑它。这是从政策网络中获得的)

Then let’s talk about the “mixing parameter” (see came back to it!). As discussed earlier, to evaluate positions, you have two options: one, simply use the value network you have been using to evaluate states all along. And two, you can try to quickly play a rollout game with your current strategy (assuming the other player will play similarly), and see if you win or lose. We saw how the value function was better than doing rollouts in general. Here they combine both. You try giving each prediction 50–50 importance, or 40–60, or 0–100, and so on. If you attach a % of X to the first, you’ll have to attach 100-X to the second. That’s what this mixing parameter means. You’ll see these hit and trial results later in the paper.

然后,我们来讨论“混合参数”(请参阅​​回来!)。 如前所述,评估职位有两种选择:一种是简单地使用一直以来用于评估状态的价值网络。 第二,您可以尝试使用当前策略快速进行首次发布游戏(假设其他玩家也可以玩),然后看看您是赢还是输。 我们看到了价值函数通常比首次展示更好。 在这里,它们结合了两者。 您尝试为每个预测赋予50–50的重要性,或40–60或0–100的重要性,依此类推。 如果将X的%附加到第一个,则必须将100-X附加到第二个。 这就是这个混合参数的意思。 您将在本文稍后看到这些命中和试用结果。

After each roll out, you update your search tree with whatever information you gained during the simulation, so that your next simulation is more intelligent. And at the end of all simulations, you just pick the best move.

每次推出后,您都可以使用在仿真过程中获得的任何信息来更新搜索树,以使下一次仿真更加智能。 在所有模拟结束时,您只需选择最佳动作即可。

Interesting insight here!

有趣的见识在这里!

Remember how the RL fine-tuned policy NN was better than just the SL human-trained policy NN? But when you put them within the MCTS algorithm of AlphaGo, using the human trained NN proved to be a better choice than the fine-tuned NN. But in the case of the value function (which you would remember uses a strong player to approximate a perfect player), training Foma using the RL policy works better than training her with the SL policy.

Remember how the RL fine-tuned policy NN was better than just the SL human-trained policy NN? But when you put them within the MCTS algorithm of AlphaGo, using the human trained NN proved to be a better choice than the fine-tuned NN. But in the case of the value function (which you would remember uses a strong player to approximate a perfect player), training Foma using the RL policy works better than training her with the SL policy.

“Doing all this evaluation takes a lot of computing power. We really had to bring out the big guns to be able to run these damn programs.”

“Doing all this evaluation takes a lot of computing power. We really had to bring out the big guns to be able to run these damn programs.”

Self explanatory.

Self explanatory.

“LOL, our program literally blew the pants off of every other program that came before us”

“LOL, our program literally blew the pants off of every other program that came before us”

This goes back to that “mixing parameter” again. While evaluating positions, giving equal importance to both the value function and the rollouts performed better than just using one of them. The rest is self explanatory, and reveals an interesting insight!

This goes back to that “mixing parameter” again. While evaluating positions, giving equal importance to both the value function and the rollouts performed better than just using one of them. The rest is self explanatory, and reveals an interesting insight!

Self explanatory.

Self explanatory.

Self explanatory. But read that red underlined sentence again. I hope you can see clearly now that this line right here is pretty much the summary of what this whole research project was all about.

Self explanatory. But read that red underlined sentence again. I hope you can see clearly now that this line right here is pretty much the summary of what this whole research project was all about.

Concluding paragraph. “Let us brag a little more here because we deserve it!” :)

Concluding paragraph. “Let us brag a little more here because we deserve it!” :)

Oh and if you’re a scientist or tech company, and need some help in explaining your science to non-technical people for marketing, PR or training etc, I can help you. Drop me a message on Twitter: @mngrwl

Oh and if you're a scientist or tech company, and need some help in explaining your science to non-technical people for marketing, PR or training etc, I can help you. Drop me a message on Twitter: @mngrwl

翻译自:

ai人工智能程序

转载地址:http://ocgwd.baihongyu.com/

你可能感兴趣的文章
二十三种设计模式及其python实现
查看>>
Math类、Random类、System类、BigInteger类、BigDecimal类、Date类、SimpleDateFormat、Calendar类...
查看>>
【设计模式】 访问者模式
查看>>
关于FFMPEG 中I帧、B帧、P帧、PTS、DTS
查看>>
web前端基础:常用跨域处理
查看>>
request和response的知识
查看>>
bootstrap 表单类
查看>>
20165332第四周学习总结
查看>>
Codeforces Round #200 (Div. 1)D. Water Tree dfs序
查看>>
linux安全设置
查看>>
Myflight航班查询系统
查看>>
团队-团队编程项目爬取豆瓣电影top250-代码设计规范
查看>>
表头固定内容可滚动表格的3种实现方法
查看>>
想对你说
查看>>
day5 面向对象
查看>>
{算法}Young司机带你轻松KMP
查看>>
不同方法获得视差图比较
查看>>
jQuery笔记(二)
查看>>
Velocity模版进行shiro验证
查看>>
新生舞会
查看>>