The Problem of Local Optima (C2W3L10)  Summary and Q&A
TL;DR
Deep learning optimization algorithms are more likely to encounter saddle points than local optima in highdimensional spaces, and plateaus can slow down learning progress.
Key Insights
 ❓ Local optima are less of a concern in deep learning optimization than previously believed.
 😥 Saddle points, where the derivative is zero in multiple directions, are more common in highdimensional spaces.
 😘 Plateaus, regions with low derivative values, can significantly hinder learning progress.
 🐢 Advanced optimization algorithms like momentum or Adam can help overcome slow learning on plateaus.
 👾 Understanding highdimensional spaces in deep learning optimization is still evolving.
 😘 Intuition from lowdimensional spaces does not necessarily apply to highdimensional optimization problems.
 😀 Learning algorithms operating with a large number of parameters face unique optimization challenges.
Transcript
in the early days of deep learning people used to worry a lot about the optimization algorithm getting stuck in bad local optima but as the theory of deep learning has advanced our understanding of local optima is also changing let me show you how we now think about local optima and problems in the optimization problem in deep learning so this was ... Read More
Questions & Answers
Q: What was the previous concern regarding deep learning optimization algorithms?
In the early days, people worried about optimization algorithms getting stuck in bad local optima, hindering progress towards global optima.
Q: Are most points with zero gradients in the cost function local optima?
No, most points of zero gradients are actually saddle points in highdimensional spaces, where the function can curve up or down in different directions.
Q: Why are saddle points more prevalent in highdimensional spaces?
In highdimensional spaces, it is less likely for all directions of the function to bend upwards, making saddle points more common than local optima.
Q: What are plateaus in the context of deep learning optimization?
Plateaus are regions where the derivative of the cost function is close to zero for a significant period, leading to flat surfaces and slow learning progress.
Summary & Key Takeaways

In deep learning, the concern about getting trapped in bad local optima has evolved, with saddle points being more common in highdimensional spaces.

Most points with zero gradients in the cost function are saddle points rather than local optima, due to the complexity of highdimensional spaces.

Plateaus, where the derivative is close to zero for an extended period, can significantly slow down learning progress.