DeepMind's AI Learns Locomotion From Scratch | Two Minute Papers #190 | Summary and Q&A

45.5K views

•

September 20, 2017

DeepMind's AI Learns Locomotion From Scratch | Two Minute Papers #190

TL;DR

Researchers use a reinforcement learning algorithm with a reward function based on forward progress to teach digital creatures to navigate complex environments, eliminating the need for precomputed motion databases or handcrafted rewards.

Install to Summarize YouTube Videos and Get Transcripts

Key Insights

🧑‍🏫 Learning algorithms can teach digital creatures to navigate complex environments without the need for precomputed motion databases.
👻 Using a reward function based on forward progress allows for more generalizable solutions to different terrains and motion types.
❓ The algorithm's modifications make the learning process robust and scalable.
👀 Natural-looking upper body movements may not be prioritized due to the reward function.

Transcript

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. We have talked about some awesome previous works where we used learning algorithms to teach digital creatures to navigate in complex environments. The input is a terrain and a set of joints, feet, and movement types, and the output has to be a series of motions that maximize... Read More

Questions & Answers

Q: How does the reinforcement learning algorithm teach digital creatures to navigate complex environments?

The algorithm uses a reward function based on forward progress, incentivizing the digital creatures to maximize their distance from the starting point. This encourages them to find various solutions to navigate the terrain.

Q: What are the advantages of using this approach?

By synthesizing motions from scratch and using a simple reward function, researchers can avoid the need for precomputed motion databases or handcrafted rewards. This allows for the algorithm to be more generalizable and applicable to a wide range of problems.

Q: How does the algorithm ensure natural-looking movements for the digital creatures?

The algorithm may not prioritize natural movements for the upper body because there is not much difference in the reward between different arm motions. As a result, the algorithm may choose random arm movements, which can lead to amusing but high-quality results.

Q: What modifications were made to the reinforcement learning algorithm?

Two modifications were made to the original algorithm. One made the learning process more robust and less dependent on parameter choices, while the other made it more scalable to deal with larger problems.

Summary & Key Takeaways

Researchers have developed a new technique to teach digital creatures to navigate complex environments using a reinforcement learning algorithm.
The algorithm synthesizes motions from scratch instead of borrowing from a preexisting motion database.
By using a reward function based on forward progress, the algorithm can learn to navigate different terrains without the need for specialized rewards.