Linear Regression 3C2276
1. **Problem Statement:**
Suppose we have data on the number of hours studied ($x$) and the corresponding exam scores ($y$) for 5 students:
$$\begin{array}{c|c}
\text{Hours Studied }(x) & \text{Exam Score }(y) \\
\hline
1 & 50 \\
2 & 55 \\
3 & 65 \\
4 & 70 \\
5 & 75
\end{array}$$
We want to find the linear regression line $y = mx + b$ that best fits this data.
2. **Formula and Important Rules:**
The formulas for the slope $m$ and intercept $b$ of the regression line are:
$$m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}$$
$$b = \frac{\sum y - m \sum x}{n}$$
where $n$ is the number of data points.
3. **Calculate the necessary sums:**
$$n = 5$$
$$\sum x = 1 + 2 + 3 + 4 + 5 = 15$$
$$\sum y = 50 + 55 + 65 + 70 + 75 = 315$$
$$\sum xy = (1)(50) + (2)(55) + (3)(65) + (4)(70) + (5)(75) = 50 + 110 + 195 + 280 + 375 = 1010$$
$$\sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 1 + 4 + 9 + 16 + 25 = 55$$
4. **Calculate slope $m$:**
$$m = \frac{5 \times 1010 - 15 \times 315}{5 \times 55 - 15^2} = \frac{5050 - 4725}{275 - 225} = \frac{325}{50} = 6.5$$
5. **Calculate intercept $b$:**
$$b = \frac{315 - 6.5 \times 15}{5} = \frac{315 - 97.5}{5} = \frac{217.5}{5} = 43.5$$
6. **Final regression equation:**
$$y = 6.5x + 43.5$$
This means for each additional hour studied, the exam score increases by 6.5 points on average.
7. **Interpretation:**
The regression line can be used to predict exam scores based on hours studied. For example, if a student studies 3 hours, predicted score is:
$$y = 6.5 \times 3 + 43.5 = 19.5 + 43.5 = 63$$
This is a simple linear regression example showing how to find the best fit line for data.