Subjects machine learning

Learning Rate 6Ff86E

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Search Solutions

Learning Rate 6Ff86E


1. The problem asks which learning curve, A or B, corresponds to a learning rate $\alpha$ that is too large during gradient descent. 2. Feature scaling often involves rescaling features to a range like $[0,1]$ or standardizing to zero mean and unit variance. A valid step is: $$x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$$ This helps gradient descent converge faster by making features comparable. 3. Regarding the learning curves: - Curve A starts high and decreases rapidly, then flattens, indicating good convergence. - Curve B starts lower but then sharply increases, indicating divergence. 4. When the learning rate $\alpha$ is too large, gradient descent overshoots the minimum, causing the cost $J(\theta)$ to increase or oscillate. 5. Therefore, curve B shows the behavior of a too large learning rate $\alpha$ because the cost increases sharply instead of decreasing. Final answer: The learning rate $\alpha$ was likely too large in case B.