Subjects calculus, optimization

Loss Curvature 9D9C92

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Search Solutions

Loss Curvature 9D9C92


1. **Problem Statement:** We analyze the quadratic loss function $$L(w) = aw^2 + bw + c$$ where $a$, $b$, and $c$ are constants. 2. **Find the first differential $dL$ and second differential $d^2L$:** - The first differential $dL$ is the derivative of $L(w)$ with respect to $w$ times $dw$: $$dL = \frac{dL}{dw} dw = (2aw + b) dw$$ - The second differential $d^2L$ is the derivative of $dL$ with respect to $w$ times $dw$: $$d^2L = \frac{d}{dw}(2aw + b) dw^2 = 2a dw^2$$ 3. **Show that the sign of $d^2L$ determines curvature:** - Since $d^2L = 2a dw^2$ and $dw^2$ is always positive (square of a small change), the sign of $d^2L$ depends solely on $a$. - If $a > 0$, then $d^2L > 0$ indicating the function is convex (curves upward). - If $a < 0$, then $d^2L < 0$ indicating the function is concave (curves downward). 4. **Explain how curvature helps in selecting learning rates for gradient descent:** - Curvature indicates how steep or flat the loss surface is. - For large positive curvature ($a$ large), the loss changes rapidly, so a smaller learning rate is needed to avoid overshooting minima. - For small curvature ($a$ near zero), the loss surface is flatter, allowing larger learning rates for faster convergence. - Understanding curvature helps adapt learning rates to ensure stable and efficient gradient descent updates. **Final answers:** - $dL = (2aw + b) dw$ - $d^2L = 2a dw^2$ - Sign of $d^2L$ (i.e., $a$) determines curvature. - Curvature guides learning rate choice in gradient descent.