網路城邦
上一篇 回創作列表 下一篇   字體:
AlphaGo - Stronger than human's at Go?
2016/04/06 08:11:14瀏覽853|回應9|推薦23

AlphaGo - Stronger than human's at Go?

本文是大俠長子英文課(科目:ENGL 390 / 老師: Dr. Nick Bujak)上所寫的報告,寫於3/10/16。

            Recently, a friend of mine saw a Facebook post by Google announcing that they have created a program called AlphaGo, a computer program that has successfully beaten Fan Hui, a professional Go player and European Go champion. Why is this significant? This is an unprecedented feat and marks the first time a professional human player has been beaten by artificial intelligence without the benefit of a handicap (compensation stones placed in the beginning of the game due to difference of ability.) Even more impressive is the fact that the computer won five to nothing; a solid victory in a series that leaves no one questioning the abilities of the computer.

History

            The history of Go dates back to around 2,500 years ago where it was considered one of the four pastimes (along with music, painting, and calligraphy) worthy for a gentleman. The game is relatively simple. It is a turn-based game played on a 19 by 19 grid with black and white stones. Each player takes turns placing down pieces on the intersection of the grids with the objective of controlling more than 50 percent of the board. The game gets interesting when complicated patterns arise from both players trying their best to outwit their opponent.

Programming Nightmare

            The first reason why it is so difficult for computers to be programmed to play Go is because of the massive number of combinations in the game. There is an estimated 10 to the power of 700 possible variations of the game of Go. By comparison, Chess only has 10 to the power of 60 possible scenarios. Secondly, it is hard for computers to estimate at any board position whether white or black has the advantage. This is different from chess or checkers because oftentimes one may be winning because they have one more piece on the board than the opponent. In Go, the status of stones being alive or dead may be ambiguous. Coupled with the fact that positional strength is hard to judge, Go is a programming nightmare. Since it is impossible to accomplish this task of solving this game directly, a different method is required.

The Solution

            Prior to AlphaGo, Go programs generally focused less on evaluating the state of the board and more on looking at how the game may play out. Crazy Stone used an algorithm called the Monte Carlo tree search to sample some possible moves to choose between rather than trying to calculate every possible sequence. Using this method, Crazy Stone is able to defeat high level amateur players. The research team at Google DeepMind, however, decided to confront the issues of Go directly. Instead of determining the best move through randomly generated options, it distinguishes a good move from a poor one and assesses the strength of its position based on the state of the board. To accomplish this task, AlphaGo is programmed with the ability to utilize a combination of a Monte Carlo tree search along with deep neural networks, or programs that attempt to learn through interpreting data through examples and experience.

            In combining an already popular method for calculating Go moves along with cutting edge computer learning technology, AlphaGo is able to be to crush a human Go professional. Despite this, many other top professionals remain skeptical of its ability as some would describe the program's play style as being too passive. AlphaGo would have to take more professional opponents in order to sway disbelievers.   

 

AlphaGo: Reinforced Learning In Go

            Following the defeat of professional Go player Fan Hui, I am interested in how the program AlphaGo displays such a high level of skill. For this research, I will go in depth in describing how AlphaGo works and explain how it is better than its predecessors.

            The game of Go has been considered the one of the most, if not the hardest challenge in game programming due to its massive search space and the difficulty it is to judge board positions. Past attempts to solve Go relinquish the idea of tackling the task of evaluating every position head on. In the case of Crazy Stone (the previous strongest Go program), a Monte Carlo Algorithm was used to simulate up to fifty thousand random games per second in order to evaluate a move. If, for example, black wins more often than white because of a particular move, then that move is more favorable to black. This strategy, however, may miss out on the best result since Crazy Stone is not analyzing based on position. 

Deep Neural Network

            In order to attempt this difficult task, a new approach to solving the game must be undertaken. As a result, a Go computer that employs the use of 'value networks' to assess any given position along with a 'policy network' to select the optimal move is created. Combined, these networks form what is called a 'deep neural network', which is 'supervised learning' by looking at past human games along with 'reinforced learning' by playing itself. Using this method, AlphaGo has maintained a 99.8% win rate against Crazy Stone.

Supervised and Reinforced Learning

            In order to create a neural network that is efficient, AlphaGo is given supervised and reinforced learning in the form of networks.

            Supervised learning (SL) works by predicting the opponent's moves through the use of a multi-layered policy network (pσ) containing 30 million board positions from the KGS Go Server. Using this information as an input, AlphaGo's policy network is essentially trained to be able to predict expert level moves of the game 57% of the time. Improvements in accuracy led to improvements in playing ability with the sacrifice of slower reaction times.

            Next, a reinforced learning (RL) policy network (pρ) improves upon the SL policy network by optimizing what happens later in the game by making AlphaGo play itself millions of times. This policy allows trains AlphaGo how to win games, rather than just predict what previous experts may do in specific situations.

            Lastly, a value network (vθ) predicts the winner of the game. This network is trained and optimized by the games against itself through the RL policy network. AlphaGo uses this knowledge gained from the facilitated training in policy networks with the Monte Carlo Algorithm in order effectively play Go at a high level.

Monte Carlo Algorithm

            Along with the deep neural network, AlphaGo uses the Monte Carlo Algorithm in order to facilitate the selection process of each move. The Monte Carlo Algorithm simulates repeated random samplings of tens of thousands of games that are already pre-programmed within the AI's database. Its method evaluates the possible inputs and aggregates the results in order to find a move.

Figure below: How Monte Carlo Algorithm works for Go


In the diagram, a) proceeds through a tree by selecting moves that have maximum action value Q, plus a function u(P) being the result of prior stored probability P. In b), each node of the tree search may be processed by pσ or the policy network and the outputs are stored as P for each following action. At c), the evaluation of each leaf node goes two ways: using the value network vθ and by rollout policy pπ in order to predict what may happen later in the game. For backup, d) shows action values Q that continue to update evaluations in c) in order to better respond to the opponent's actions.

Value and Policy Networks

Figure below: How AlphaGo selects a move in the game against Fan Hui


This diagram is an example of a position in which AlphaGo must make an optimal move. In a) it evaluates all possible moves highlighted by the different shades of blue. The darker shades indicate a higher chance of winning. The circled position in orange indicates the point with maximum value according to the statistics AlphaGo calculated. b) Shows possible actions that may occur averaged over the value network purely. c) Is similar to b) but it shows the best possible action from a positional point of view. d) Shows the frequencies of the actions that were picked out by the policy network. e) Presents the frequencies the moves were selected based on numerous trial and error results from AlphaGo's search tree. f) Alphago selects to play the move indicated by the red circle. Fan Hui responds by playing the move indicated by the white space above the red circle. AlphaGo indicates that the move (labeled 1) would have been the most optimal result.

Conclusion

            By utilizing the both the policy network for SL and the value network for RF, AlphaGo is able to optimally evaluate game positions in a way that has never been done before. Combined with previous technology (Monte Carlo Algorithm), AlphaGo is able to defeat a professional Go player. This is a feat that has never been achieved by past Go programs.

           

Bibliography

Borrell, Brendan. "AI Invades Go Territory." WIRED. 19 Sept. 2006. Web. 8 Mar. 2016.

"Google Achieves AI 'breakthrough' by Beating Go Champion - BBC News." BBC News. 27 Jan.            2016. Web. 23 Feb. 2016.

Cho, Adrian. "'Huge Leap Forward': Computer That Mimics Human Brain Beats Professional at   Game of Go." Science AAAS. 2016. Web. 23 Feb. 2016.

Gibney, Elizabeth. "Google AI Algorithm Masters Ancient Game of Go."Nature.com. Nature       Publishing Group, 27 Jan. 2016. Web. 23 Feb. 2016.

Naughton, John. "Can Google's AlphaGo Really Feel It in Its Algorithms?" The Guardian.           Guardian News and Media, 31 Jan. 2016. Web. 23 Feb. 2016.

Nunez, Michael. "Google Just Beat Facebook in Race to Artificial Intelligence       Milestone." Gizmodo. 27 Jan. 2016. Web. 23 Feb. 2016.

Silver, David, and Demis Hassabis. "AlphaGo: Mastering the Ancient Game of Go with Machine             Learning." Research Blog. Google, 27 Jan. 2016. Web. 23 Feb. 2016.

Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den         Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc     Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya   Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and    Demis Hassabis. "Mastering the Game of Go with Deep Neural Networks and Tree             Search." Nature 529.7587 (2016): 484-89. Web.

 

 

( 不分類不分類 )
回應 推薦文章 列印 加入我的文摘
上一篇 回創作列表 下一篇

引用
引用網址:https://classic-blog.udn.com/article/trackback.jsp?uid=chin8673&aid=52585468

 回應文章

金大俠
等級:8
留言加入好友
2017/05/13 00:51

 

The software development team is important. 

Don’t forget another critical factor: hardware … (1) the availability of GPUs (along with CPUs) that make parallel processing ever faster, cheaper, and more powerful; (2) the simultaneous one-two punch of practically infinite storage and a flood of data of every stripe (the Big Data movement) …

Over the past few years AI (its deep learning) has exploded (especially since 2015). Much of that has to do with the advance of hardware.


柿事如意(世界日報家園版)

好女婿
開啟斜槓人生
柿柿如意 金塊高掛

chin
2016/08/03 00:22

American Go E-Journal for 07/31/2016

金大俠(chin8673) 於 2016-08-03 00:24 回覆:

American Go E-Journal for 07/31/2016

Huge Audience Turns Out for AlphaGo Keynote at U.S. Go Congress

Jul 30, 2016 11:41 pm | Chris Garlock

With over 600 signed up, this year’s U.S. Go Congress in Boston has the most registrants in the 32-year history of the event and it seemed as though just about every one of them was crowded into the main playing area in Boston University’s George Sherman Union Saturday night as AlphaGo’s Aja Huang 7d gave the keynote address, along with Fan Hui 2P. The audience was spellbound as the two gave a fascinating insider’s look at the two-year development of the AI program that decisively defeated Lee Sedol last March and attracted global attention to the game of go.

Huang (right) gave an overview of how AlphaGo started in 2014 as a 2-man project as he and David Silver worked to combine Monte-Carlo tree search with deep neural networks trained by supervised learning from human expert games, and reinforcement learning from games of self-play; the team later expanded to nearly two dozen. While the details are fully explained in the team’s Nature paper, Huang shared personal stories like how Fan Hui was chosen to test the program. “I saw him at a tournament in Dublin and the top Korean players were all going out to drink the night before the tournament but he said no, he couldn’t go because he had to prepare for the games, so I knew he was very serious,” Huang laughed.

Fan Hui (left) said that he almost missed the invitation to visit the DeepMind team in London because it seemed a bit odd and he thought “it might just be spam.” In fact, “when I heard it was Google, I assumed they would be hooking me up with something like Google Glass, so when I found out they just wanted me to play a computer program I was so relieved and thought Oh, this will be easy.” In perhaps the most poignant story of the evening, Fan Hui took the rapt audience through his five secret games with AlphaGo in Fall 2015, losing every game until at the end, “my game was crushed and I thought I now knew nothing about go.” Out of those defeats, however, Fan Hui discovered even greater depths, not just to go itself, but to his own
fascination and love of the game. “What AlphaGo teaches us is that you can play anywhere,” he said, as the audience erupted in applause.

After their presentation, the two took questions from the audience, many of whom wanted to know things like when an AlphaGoBot on KGS will be available and whether a strong version of the program would be available in the near future for desktops or handhelds. Most were answered cryptically with “Under discussion,” but in response to a question about how strong AlphaGo is today, Huang — who had earlier showed a graph charting improvement of one rank a month — did say that it was possible that the program could now give a top professional two stones, but that this has not yet been tested.

Longtime International Go Federation and American Go Association official Thomas Hsiang presented Huang and Fan with a special award from the International Go Federation to the AlphaGo team “in appreciation for its outstanding contribution towards the development and promotion of go.”
- Chris Garlock; photos by Phil Straus
Read more about AlphaGo here and check out all our AI posts here.


戈 筆 揚
等級:8
留言加入好友
2016/04/11 12:01
This is fantastic. It includes quite a lot of knowledge.
金大俠(chin8673) 於 2016-08-03 00:25 回覆:
thank you 愛你喲!

福 到
等級:8
留言加入好友
想通就好了
2016/04/08 21:11

很多人不願與強過自己的人下棋, 原因怕輸

被讓子,多沒面子; 不願被讓, 對方也不想下....差距大,沒意思

其實, 讓子(授子)的目的在於平衡, 方能拉近棋力差距

不致一面倒, 隨便下下都贏多沒勁阿

即使旗鼓相當也要講求平衡, 否則黑棋何必貼目

金大俠(chin8673) 於 2016-04-09 11:48 回覆:
下棋就要不怕輸(其實,打球、競賽•••等活動都如此)

輸就是贏

輸棋者所學得的肯定比贏棋者所學得的還要多

我曾與好友下棋

規則是:輸者下一回要被多讓一子、或下一回要少讓對方一子;

鸁者下一回要多讓對方一子、或下一回要被少讓一子。

是很不錯的規則微笑

老魔王
等級:8
留言加入好友
2016/04/08 02:59
其一的圍棋功力, 肯定有相當的程度, 才能做出這樣的分析和結論吧! 讚~~
金大俠(chin8673) 於 2016-04-09 10:12 回覆:
他比我強啦誰理你

多硯坊 (休)
等級:8
留言加入好友
2016/04/07 15:58

虎父無犬子

年雄出少年

金大俠(chin8673) 於 2016-04-08 11:24 回覆:
虎父有犬子

我屬虎

長子屬狗

虎父不僅有犬子

虎父也有豕子喔大笑

悅己
等級:8
留言加入好友
2016/04/07 14:38

哇,大俠兒子圍棋這麽高桿

還會寫論文

前途不可限量

林海峰第二耶!!!


金大俠(chin8673) 於 2016-04-08 11:14 回覆:
小犬圍棋是比我高桿

寫論文亦尚可

前途仍在摸索不可限量


臺灣職業圍棋棋士周咸亨大學時熱衷沈迷於圍棋

大學唸了八年、共四所學校(當然是越來越差的大學),都是因為下圍棋


林海峰第二不是熱衷沈迷就可當上耶!!!誰理你

pearlz (民進黨抹黑霸凌WHO )
等級:8
留言加入好友
handicap
2016/04/06 11:56

「讓步」嗎?我從來不知道電腦的遊戲有先讓步的說法,這很特別。也是第一次看到 handicap 這樣的用法 - 跟高爾夫球不一樣。呵呵。


金大俠(chin8673) 於 2016-04-07 11:18 回覆:
handicap就是中文的「讓子」「授子」啦

高手與低手下棋,要「讓子」,否則高手會將低手殺得死死的

可讓2子、3子、4子、5子等等

二人才有得下啦

pearlz (民進黨抹黑霸凌WHO )
等級:8
留言加入好友
這是
2016/04/06 11:18

金大公子的論文?寫得真好,應該得到最高的評分。

他也是圍棋高手嗎?也是程式設計師嗎?他的論文角度,就是我對這個 Alpha Go 所想問的問題。

當然啦!技術性的東西我就略過了。


金大俠(chin8673) 於 2016-04-07 11:13 回覆:
謝謝珍珠

小犬圍棋二、三段,比我強啦微笑