Correlation Regression 04A193
1. **Problem Statement:** Given advertising expenditure $X$ (in $100$ units) and sales revenue $Y$ (in $1000$ units) as follows:
$X: 1, 2, 3, 4, 5$
$Y: 3, 3, 5, 5, 6$
Calculate:
a) The coefficient of correlation between $X$ and $Y$.
b) The linear regression equation $y = a + bx$.
c) The reliability of the model.
2. **Formulas and Important Rules:**
- Mean of $X$: $\bar{X} = \frac{\sum X_i}{n}$
- Mean of $Y$: $\bar{Y} = \frac{\sum Y_i}{n}$
- Covariance: $S_{XY} = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n}$
- Variance of $X$: $S_X^2 = \frac{\sum (X_i - \bar{X})^2}{n}$
- Variance of $Y$: $S_Y^2 = \frac{\sum (Y_i - \bar{Y})^2}{n}$
- Coefficient of correlation: $r = \frac{S_{XY}}{S_X S_Y}$
- Regression coefficients:
$$b = \frac{S_{XY}}{S_X^2}, \quad a = \bar{Y} - b \bar{X}$$
- Reliability of the model (coefficient of determination): $r^2$
3. **Calculations:**
- $n = 5$
- Calculate sums:
$$\sum X = 1+2+3+4+5 = 15$$
$$\sum Y = 3+3+5+5+6 = 22$$
$$\sum X^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55$$
$$\sum Y^2 = 3^2 + 3^2 + 5^2 + 5^2 + 6^2 = 9 + 9 + 25 + 25 + 36 = 104$$
$$\sum XY = (1)(3) + (2)(3) + (3)(5) + (4)(5) + (5)(6) = 3 + 6 + 15 + 20 + 30 = 74$$
- Means:
$$\bar{X} = \frac{15}{5} = 3$$
$$\bar{Y} = \frac{22}{5} = 4.4$$
- Calculate covariance:
$$S_{XY} = \frac{\sum XY}{n} - \bar{X} \bar{Y} = \frac{74}{5} - (3)(4.4) = 14.8 - 13.2 = 1.6$$
- Calculate variance of $X$:
$$S_X^2 = \frac{\sum X^2}{n} - \bar{X}^2 = \frac{55}{5} - 3^2 = 11 - 9 = 2$$
- Calculate variance of $Y$:
$$S_Y^2 = \frac{104}{5} - 4.4^2 = 20.8 - 19.36 = 1.44$$
- Calculate standard deviations:
$$S_X = \sqrt{2} \approx 1.414$$
$$S_Y = \sqrt{1.44} = 1.2$$
- Calculate coefficient of correlation:
$$r = \frac{1.6}{1.414 \times 1.2} = \frac{1.6}{1.6968} \approx 0.9428$$
- Calculate regression coefficients:
$$b = \frac{S_{XY}}{S_X^2} = \frac{1.6}{2} = 0.8$$
$$a = \bar{Y} - b \bar{X} = 4.4 - 0.8 \times 3 = 4.4 - 2.4 = 2$$
- Regression equation:
$$y = 2 + 0.8x$$
- Reliability of the model:
$$r^2 = (0.9428)^2 \approx 0.889$$
4. **Interpretation:**
- The coefficient of correlation $r \approx 0.943$ indicates a strong positive linear relationship between advertising expenditure and sales revenue.
- The regression line $y = 2 + 0.8x$ can be used to predict sales revenue based on advertising expenditure.
- The reliability $r^2 \approx 0.889$ means about 88.9% of the variation in sales revenue is explained by the advertising expenditure.
**Final answers:**
a) $r \approx 0.943$
b) $y = 2 + 0.8x$
c) Reliability $r^2 \approx 0.889$