Loss Curvature 9D9C92
1. **Problem Statement:** We analyze the quadratic loss function $$L(w) = aw^2 + bw + c$$ where $a$, $b$, and $c$ are constants.
2. **Find the first differential $dL$ and second differential $d^2L$:**
- The first differential $dL$ is the derivative of $L(w)$ with respect to $w$ times $dw$:
$$dL = \frac{dL}{dw} dw = (2aw + b) dw$$
- The second differential $d^2L$ is the derivative of $dL$ with respect to $w$ times $dw$:
$$d^2L = \frac{d}{dw}(2aw + b) dw^2 = 2a dw^2$$
3. **Show that the sign of $d^2L$ determines curvature:**
- Since $d^2L = 2a dw^2$ and $dw^2$ is always positive (square of a small change), the sign of $d^2L$ depends solely on $a$.
- If $a > 0$, then $d^2L > 0$ indicating the function is convex (curves upward).
- If $a < 0$, then $d^2L < 0$ indicating the function is concave (curves downward).
4. **Explain how curvature helps in selecting learning rates for gradient descent:**
- Curvature indicates how steep or flat the loss surface is.
- For large positive curvature ($a$ large), the loss changes rapidly, so a smaller learning rate is needed to avoid overshooting minima.
- For small curvature ($a$ near zero), the loss surface is flatter, allowing larger learning rates for faster convergence.
- Understanding curvature helps adapt learning rates to ensure stable and efficient gradient descent updates.
**Final answers:**
- $dL = (2aw + b) dw$
- $d^2L = 2a dw^2$
- Sign of $d^2L$ (i.e., $a$) determines curvature.
- Curvature guides learning rate choice in gradient descent.