Second-order methods that look not only at the value and gradient of the objective function but also at its curvature can help in this case. While these methods cannot be applied to deep learning directly due to the computational cost, they provide useful intuition into how to design advanced optimization algorithms that mimic many of the desirable...
This means that, if we use
, the value of function
might decline. Therefore, in gradient descent we first choose an initial value
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.