Font Size: a A A

Study On The Application Of Residual Policy Network In Computer Go

Posted on:2019-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Z WuFull Text:PDF
GTID:2348330542998628Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Although artificial intelligence technology has been rapidly developed in various fields in recent years,Go has always been a puzzle in the field of artificial intelligence due to its own complexity.It is generally accepted by the international academic community that it takes at least 10 to 20 years to solve the problem of Go time.In order to solve this problem,AlphaGo Computer Go which is designed by DeepMind Company brings the convolutional neural network into the problem of computer Go.With this method,AlphaGo defeated the top level Li Shishi and Ke Jie Ji who represent Go players,Mankind has really solved the Go problem in the field of artificial intelligence,and how to further improve the computer Go force has become the direction that scholars continue to explore.The main method to improve the chess power of computer is to improve the accuracy of convolution neural network in computer Go,but with the deepening of the network layer,the convolution neural network may be over fitted because of the network complexity.The deep residual network invented by former Microsoft researcher Hekai Mingfa effectively solves this problem.Deep residual network can greatly enhance the expressive ability of deep learning network and make it easy to train up to 150-layer network to realize the network Improve the accuracy.Based on the research of the policy network model in AlphaGo,this paper designs a residual policy network that combines tactical network with deep residual network to improve the computing power of Go.This paper first studies the policy network model in AlphaGo and the basic principle of deep residual network.Followed by the policy network requirements for data collection,sorting and processing work,and based on these data reproduced AlphaGo's policy network.After that,based on this,the paper completed the data of the residual policy network and implemented and trained the residual policy network using the same data.Finally,the experimental comparison shows that the residual policy network has higher precision than the original policy network,which can effectively improve the chess power of the computer go.
Keywords/Search Tags:artificial intelligence, computer go, deep learning, policy network, residual network
PDF Full Text Request
Related items