AlphaGo Zero

發展歷史

2017年10月19日凌晨，在國際學術期刊《自然》（Nature）上發表的一篇研究論文中，谷歌下屬公司Deepmind報告新版程式AlphaGo Zero：從空白狀態學起，在無任何人類輸入的條件下，它能夠迅速自學圍棋，並以100:0的戰績擊敗“前輩”。它經過3天的訓練便以100：0的戰績擊敗了AlphaGo Lee，經過40天的訓練便擊敗了AlphaGo Master。

工作原理

“拋棄人類經驗”和“自我訓練”並非AlphaGo Zero最大的亮點，其關鍵在於採用了新的reinforcement learning（強化學習的算法），並給該算法帶了新的發展。

戰績

AlphaGo Zero僅擁有4個TPU，零人類經驗，其自我訓練的時間僅為3天，自我對弈的棋局數量為490萬盤。但它以100:0的戰績擊敗前輩。

原論文摘要

原文

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

AlphaGo Zero

基本介紹

發展歷史

工作原理

戰績

原論文摘要

原文

翻譯

相關詞條

熱門詞條