Deep Learning From Human Preferences | Two Minute Papers #196 | Summary and Q&A

19.0K views

•

October 11, 2017

Deep Learning From Human Preferences | Two Minute Papers #196

TL;DR

OpenAI and DeepMind collaborate to introduce human control in reinforcement learning, successfully teaching a digital creature to perform a backflip.

Install to Summarize YouTube Videos and Get Transcripts

Key Insights

🧑‍🏫 Reinforcement learning with human control can successfully teach complex concepts such as backflips.
❓ Sparse and vague rewards can still enable learning in reinforcement learning algorithms.
👨‍🔬 Collaboration between AI research giants like OpenAI and DeepMind promotes sharing and collaboration for the benefit of all.

Transcript

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. In this new age of AI, there is no shortage of articles and discussion about AI safety, and of course, rightfully so: these new learning algorithms started solving problems that were previously thought to be impossible in quick succession. Only ten years ago, if we told some... Read More

Questions & Answers

Q: How does reinforcement learning work in teaching a digital creature to perform a backflip?

Reinforcement learning involves the algorithm performing a series of actions to maximize a score, with the score provided by a human supervisor. The algorithm learns from the feedback of whether the backflip was successful or not.

Q: Why is learning through binary yes/no scores challenging?

Learning from binary yes/no scores is challenging because it provides sparse and vague rewards. In this case, less than 1% of actions receive feedback, yet the algorithm still manages to learn difficult concepts.

Q: What makes the collaboration between OpenAI and DeepMind significant?

The collaboration between OpenAI and DeepMind is significant because it demonstrates their dedication to working together and sharing their findings for the greater good, despite the tendency for secrecy in AI research.

Q: How does this research contribute to AI safety?

This research contributes to AI safety by introducing more human control in reinforcement learning algorithms. It allows humans to oversee and guide the learning process, which is essential for ensuring AI algorithms are used for good purposes.

Summary & Key Takeaways

Collaboration between OpenAI and DeepMind focuses on introducing human control in reinforcement learning problems.
Algorithm learns to perform a backflip through reinforcement learning, with human supervisors providing binary feedback on success or failure.
Less than 1% of actions receive feedback, yet the algorithm learns difficult concepts using sparse and vague rewards.