Temperature Attendance 287E4B
1. **State the problem:**
We want to analyze the relationship between Daily High Temperature ($X$) and Attendance ($Y$) at town meetings using simple linear regression.
2. **Null hypothesis:**
- Formal: $H_0: \beta = 0$ (no linear relationship between temperature and attendance).
- Lay terms: Temperature does not predict attendance.
3. **Calculate regression parameters and correlation $r$:**
Given data:
$$X = [15.3, 24.9, 9.0, 35.0, 36.1, 22.1, 22.7, 24.5]$$
$$Y = [35, 80, 10, 75, 85, 75, 70, 80]$$
Calculate means:
$$\bar{X} = \frac{15.3 + 24.9 + 9.0 + 35.0 + 36.1 + 22.1 + 22.7 + 24.5}{8} = 23.44$$
$$\bar{Y} = \frac{35 + 80 + 10 + 75 + 85 + 75 + 70 + 80}{8} = 60$$
Calculate sums for slope $b$ and correlation $r$:
$$S_{XY} = \sum (X_i - \bar{X})(Y_i - \bar{Y}) = 1021.94$$
$$S_{XX} = \sum (X_i - \bar{X})^2 = 399.43$$
$$S_{YY} = \sum (Y_i - \bar{Y})^2 = 4625$$
Slope:
$$b = \frac{S_{XY}}{S_{XX}} = \frac{1021.94}{399.43} = 2.56$$
Intercept:
$$a = \bar{Y} - b\bar{X} = 60 - 2.56 \times 23.44 = 60 - 60.0 = 0.0$$
Correlation:
$$r = \frac{S_{XY}}{\sqrt{S_{XX} S_{YY}}} = \frac{1021.94}{\sqrt{399.43 \times 4625}} = 0.75$$
Regression line:
$$\hat{Y} = 0.0 + 2.56X$$
4. **Test significance at $\alpha=0.05$:**
Degrees of freedom: $n-2=6$
Calculate $t$-statistic:
$$t = r \sqrt{\frac{n-2}{1-r^2}} = 0.75 \sqrt{\frac{6}{1-0.75^2}} = 0.75 \sqrt{\frac{6}{1-0.5625}} = 0.75 \sqrt{13.71} = 2.78$$
Critical $t$ for 6 df at 0.05 two-tailed is about 2.447.
Since $2.78 > 2.447$, reject $H_0$; slope is significant.
5. **Explain findings:**
- $r=0.75$ indicates a strong positive linear relationship.
- $R^2 = r^2 = 0.56$ means 56% of attendance variability is explained by temperature.
- Slope $b=2.56$ means for each 1°F increase, attendance increases by about 2.56 people.
6. **95% confidence interval for slope $b$:**
Standard error of slope:
$$SE_b = \frac{s}{\sqrt{S_{XX}}}$$
Where $s$ is standard error of estimate:
$$s = \sqrt{\frac{\sum (Y_i - \hat{Y_i})^2}{n-2}} = 18.3$$ (calculated from residuals)
Calculate:
$$SE_b = \frac{18.3}{\sqrt{399.43}} = 0.916$$
$t_{0.025,6} = 2.447$
CI:
$$b \pm t SE_b = 2.56 \pm 2.447 \times 0.916 = (0.31, 4.81)$$
Lay terms: We are 95% confident the true slope is between 0.31 and 4.81, confirming a positive relationship.
7. **95% confidence interval for $Y$ when $X=25$:**
Predicted $Y$:
$$\hat{Y} = 0 + 2.56 \times 25 = 64.0$$
Standard error of prediction:
$$SE_{pred} = s \sqrt{1 + \frac{1}{n} + \frac{(X_0 - \bar{X})^2}{S_{XX}}} = 18.3 \sqrt{1 + \frac{1}{8} + \frac{(25 - 23.44)^2}{399.43}} = 19.5$$
CI for $Y$:
$$64.0 \pm 2.447 \times 19.5 = (15.3, 112.7)$$
Lay terms: We are 95% confident the attendance at 25°F will be between about 15 and 113 people.
---
Final regression line:
$$\hat{Y} = 2.56X$$