David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86 | Summary and Q&A

April 3, 2020
Lex Fridman Podcast
YouTube video player
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86


AlphaGo, a deep reinforcement learning program, made history by defeating the world champion Go player, revealing the potential power of AI and marking a major milestone in the field.

Install to Summarize YouTube Videos and Get Transcripts

Questions & Answers

Q: How did AlphaGo utilize reinforcement learning to win against the world champion Go player?

AlphaGo trained by playing thousands of games against itself, utilizing deep learning techniques. It utilized Monte Carlo tree search to evaluate positions and determine optimal moves.

Q: What was the significance of AlphaGo's victory?

AlphaGo's victory marked a major milestone in the field of AI, showcasing the potential of deep reinforcement learning to surpass human intelligence in complex games.

Q: What was the role of human data in the development of AlphaGo?

Human data was initially used to explore the capabilities of deep learning and to better understand the game of Go. However, the ultimate goal was to develop a system that could learn from self-play and improve its performance.

Q: How did AlphaGo challenge the conventions of Go play?

AlphaGo showcased creativity by making unconventional moves that human players had not anticipated. These moves, such as playing on the fifth line instead of the third or fourth line, led to new insights and expanded the knowledge of the game.


This conversation is with David Silver, the lead researcher on AlphaGo, AlphaZero, and AlphaStar at DeepMind. The discussion focuses on reinforcement learning and its application in artificial intelligence. David shares his early experiences with computers and programming, his fascination with the game of Go, and his belief in the power of reinforcement learning to solve complex problems. He explains the fundamental concept of reinforcement learning, the different types of approaches within RL (value-based, model-based, policy-based), and the emergence of deep reinforcement learning using neural networks. David also discusses the surprising effectiveness of deep learning despite the challenges of high-dimensional optimization.

Questions & Answers

Q: What was the first program you've ever written, and what programming language do you remember?

I remember my parents bringing home a BBC Model B microcomputer when I was about seven years old. I couldn't resist playing around with it and wrote my first program, which was to display my name in different colors and loop it. There was something magical about it that led to more programming experiments. [Answer (longer)]

Q: How did you think about computers back then, and what were your thoughts about the magical aspect of programming?

Programming computers went beyond solving puzzles for me. It opened up limitless possibilities. Just like playing with Lego, computers allowed me to create anything I wanted without constraints. Having a computer in front of me sparked my fascination, and I delved into user guides and advanced programming to learn more. It was a feeling of freedom and potential that I found magical. [Answer (longer)]

Q: When did you first fall in love with artificial intelligence and the dreams of AI?

I became fascinated with artificial intelligence when I studied computer science at Cambridge University. As I questioned the goals and direction of computer science, recreating human-like intelligence stood out as the most significant step forward. The idea of cracking the problem of human intelligence was something inside me, tugging at me to pursue it. [Answer (longer)]

Q: Do you remember the first time you wrote a program that beat you in a game?

As a game programmer in the industry, I built handcrafted AI agents that could outperform me in certain limited cases. However, if we're talking about real AI, the first experience came when I realized handcrafted approaches were limited and not aligned with my intention to understand and solve intelligence. It was when I built my first Go program using reinforcement learning that it beat me. [Answer (longer)]

Q: How did it feel when your Go program beat you?

It felt good and satisfying. Witnessing a system I created learn from first principles and surpass my own abilities in a game as complex as Go was immensely satisfying. It validated my belief in the potential of reinforcement learning and AI. It was a moment of excitement and a realization that what I had felt should work actually worked. [Answer (longer)]

Q: How significant do you consider AlphaGo and AlphaZero's mastery in the game of Go?

Personally, I believe that AlphaZero, AlphaGo's successor, and its mastery in the game of Go is one of the most important accomplishments in the history of artificial intelligence. To me, it was a transformative and profoundly inspiring moment. Solving a game like Go, which was considered unbeatable by AI using traditional methods, opened up new possibilities and showcased the power of reinforcement learning. [Answer (longer)]

Q: Can you explain the rules and complexities of the game of Go?

The game of Go is played on a 19x19 grid, and the players take turns placing their stones on the intersections. The objective is to surround as much territory as possible with your stones. The complexity arises from the simplicity of the rules, leading to countless strategic decisions and a vast branching factor for search. The ability to intuitively evaluate a position's potential patterns and territory makes Go challenging for AI. [Answer (longer)]

Q: What is reinforcement learning?

Reinforcement learning is a problem formulation in which an agent interacts with an environment, taking actions and receiving observations and rewards. The goal is to learn a policy that maximizes those rewards over time. Reinforcement learning allows an agent to acquire knowledge through trial and error and use that knowledge to make better decisions in complex environments. It is a fundamental concept in understanding and achieving artificial intelligence. [Answer (longer)]

Q: What are the different types of reinforcement learning approaches?

There are several branches within reinforcement learning that represent different approaches to solving the problem. These include value-based, model-based, and policy-based methods. Value-based approaches focus on estimating the value function, which predicts the expected future rewards given a state. Model-based approaches aim to predict the dynamics of the environment, allowing the agent to plan and simulate actions. Policy-based approaches directly learn the policy, which is a mapping from states to actions. These approaches can be combined or used independently depending on the problem. [Answer (longer)]

Q: What is deep reinforcement learning, and how does it utilize neural networks?

Deep reinforcement learning is a family of solution methods that leverage the power of neural networks to represent and learn components of the agent solution. Neural networks are capable of learning complex representations and can be used to represent value functions, policies, or models. Deep learning provides a toolkit that allows RL agents to represent and learn any function, removing the limitations of previous approaches. It is a surprising and beautiful concept that neural networks can continue to perform and learn in high dimensions despite early skepticism. [Answer (longer)]


Reinforcement learning, especially deep reinforcement learning using neural networks, offers a powerful approach to understanding and achieving artificial intelligence. It allows agents to learn and optimize their behavior in complex environments by maximizing rewards. The surprising effectiveness of deep learning and its ability to learn complex representations have opened up new possibilities and challenges traditional assumptions about optimization and intelligence. As we explore and push the boundaries of reinforcement learning, there are still many exciting discoveries and advancements ahead.

Summary & Key Takeaways

  • AlphaGo, a deep reinforcement learning program, emerged as a groundbreaking AI achievement and defeated world champion Go player, Lisa Dahl.

  • The program utilized Monte Carlo tree search and deep learning techniques to evaluate positions and make optimal moves.

  • AlphaGo's victory highlighted the power of reinforcement learning and the ability to surpass human intelligence in complex games.

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Explore More Summaries from Lex Fridman Podcast 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on: