Subjects machine learning

Gradient Descent 213350

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Search Solutions

Gradient Descent 213350


1. **Problem Statement:** We are given the loss function $$L(w) = (w - 3)^2 + 2$$ and the gradient descent update rule $$w_{k+1} = w_k - \eta \nabla L(w_k)$$. We need to: (a) Find the differential $dL$ at $w_0 = 1$. (b) Calculate $w_1$ using learning rate $\eta = 0.1$. (c) Interpret the relationship between $dL$ and the weight update. (d) Explain why moving in the negative gradient direction minimizes the loss. 2. **Formula and Rules:** - The gradient $\nabla L(w)$ is the derivative of $L(w)$ with respect to $w$. - The differential $dL$ at a point $w_0$ is $dL = \nabla L(w_0) dw$ where $dw$ is a small change in $w$. - Gradient descent updates weights by moving opposite to the gradient to reduce the loss. 3. **Step (a): Find $dL$ at $w_0=1$** - Compute the derivative: $$\nabla L(w) = \frac{d}{dw} \left((w - 3)^2 + 2\right) = 2(w - 3)$$ - Evaluate at $w_0=1$: $$\nabla L(1) = 2(1 - 3) = 2(-2) = -4$$ - The differential is: $$dL = \nabla L(1) dw = -4 dw$$ 4. **Step (b): Find $w_1$ using $\eta=0.1$** - Using the update rule: $$w_1 = w_0 - \eta \nabla L(w_0) = 1 - 0.1 \times (-4) = 1 + 0.4 = 1.4$$ 5. **Step (c): Interpret relationship between $dL$ and weight update** - The differential $dL$ shows how the loss changes with a small change in $w$. - Since $\nabla L(1) = -4$, increasing $w$ slightly (positive $dw$) decreases $L$ (because $dL$ is negative). - The weight update moves $w$ in the direction that reduces $L$, consistent with the sign of $dL$. 6. **Step (d): Why negative gradient direction minimizes loss** - The gradient points in the direction of steepest increase of $L$. - Moving opposite to the gradient (negative gradient) moves towards decreasing $L$ most rapidly. - This ensures each update reduces the loss, leading to convergence to a minimum. **Final answers:** (a) $dL = -4 dw$ at $w_0=1$ (b) $w_1 = 1.4$ (c) The differential $dL$ indicates how loss changes with $w$; the update moves $w$ to reduce $L$. (d) Negative gradient direction points to steepest decrease, minimizing the loss function.