In my previous flashes I talked about how DeepMind’s AlphaGo beat the world’s best human Go player by using reinforcement learning and deep learning and giving the computer lots of games to analyze and learn from. But what if the computer system had to learn entirely from itself? What if it’s given no human knowledge but had to learn from scratch?
To answer that question DeepMind experts created AlphaZero a single system which taught itself how to master the games of chess, shogi (Japanese chess) and Go. AlphaZero was given the rules for each game and then through random play, and with no built in human knowledge, learned by playing against itself millions of times. Initially it’s games were weak and erratic but over time it learned which game strategies worked and were successful. It learned a pattern that caused it to win a game and used that pattern more and more and patterns that lead to losing were used less and less so that the system was more likely over time to choose more advantageous moves.
AlphaZero ultimately defeated AlphaGo, the world’s best Go player, 100 games to 0. Researchers realized that when you put your preferences and predispositions into the computer system, it made the system weaker. The system that learns from itself is a stronger player. By playing 44 millions games against itself, in 2019 AlphaZero had become the best player in the world for Go and shogi. And Alpha Zero became the best chess player in the world with astonishing speed. The headlines read “Entire human chess knowledge learned and surpassed by DeepMinds Alpha Zero in 4 hours.” The byline was that it was essentially managed in little more than the time between breakfast and lunch.
However the most fascinating part about AlphaZero’s abiliites was the style used by the computer system to win at these games. Being self taught, AlphaZero didn’t follow conventional wisdom of the games but developed it’s own intuition and strategies that were completely novel and never seen before. World champion players described the game playing as ground breaking and highly dynamic. For example in chess, AlphaZero de-emphasized the importance of each piece’s value, sacrificing highly valued pieces early on for an advantage in the game in the long term. In a new book about AlphaZero’s chess games called Game Changer, the authors state, “It’s like discovering the secret notebooks of some great player from the past.”
AlphaZero’s ability to master and become world champion of 3 different complex games demonstrates a self teaching system can work for any information game but more importantly, can discover new knowledge in a range of settings. This brings DeepMind closer to it’s ultimate mission to solve intelligence by creating general learning systems, in essence, artificial general intelligence, and then using that to solve all the other world problems.
A transcript of this and other podflashes, along with additional reading, can be found at my website, drpepermd.com.
From short and sweet AI, I’m Dr. Peper.