Learning Rate 6Ff86E
1. The problem asks which learning curve, A or B, corresponds to a learning rate $\alpha$ that is too large during gradient descent.
2. Feature scaling often involves rescaling features to a range like $[0,1]$ or standardizing to zero mean and unit variance. A valid step is:
$$x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$$
This helps gradient descent converge faster by making features comparable.
3. Regarding the learning curves:
- Curve A starts high and decreases rapidly, then flattens, indicating good convergence.
- Curve B starts lower but then sharply increases, indicating divergence.
4. When the learning rate $\alpha$ is too large, gradient descent overshoots the minimum, causing the cost $J(\theta)$ to increase or oscillate.
5. Therefore, curve B shows the behavior of a too large learning rate $\alpha$ because the cost increases sharply instead of decreasing.
Final answer: The learning rate $\alpha$ was likely too large in case B.