Correlation Regression 9B340F
1. **Problem Statement:** Given advertising expenditure $X$ (in $100$ units) and sales revenue $Y$ (in $1000$ units) as:
$X = [1, 2, 3, 4, 5]$
$Y = [3, 3, 5, 5, 6]$
We need to:
a) Calculate the coefficient of correlation between $X$ and $Y$.
b) Fit the linear regression model $y = a + bx$.
c) Calculate the reliability of the model.
---
2. **Formulas and Important Rules:**
- Mean of $X$: $\bar{X} = \frac{\sum X_i}{n}$
- Mean of $Y$: $\bar{Y} = \frac{\sum Y_i}{n}$
- Covariance: $S_{XY} = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n}$
- Variance of $X$: $S_{XX} = \frac{\sum (X_i - \bar{X})^2}{n}$
- Variance of $Y$: $S_{YY} = \frac{\sum (Y_i - \bar{Y})^2}{n}$
- Coefficient of correlation: $r = \frac{S_{XY}}{\sqrt{S_{XX} S_{YY}}}$
- Regression coefficients:
$$b = \frac{S_{XY}}{S_{XX}}, \quad a = \bar{Y} - b \bar{X}$$
- Reliability (coefficient of determination): $r^2$
---
3. **Calculations:**
- Number of data points: $n=5$
- Calculate means:
$$\bar{X} = \frac{1+2+3+4+5}{5} = 3$$
$$\bar{Y} = \frac{3+3+5+5+6}{5} = \frac{22}{5} = 4.4$$
- Calculate deviations and products:
| $X_i$ | $Y_i$ | $X_i - \bar{X}$ | $Y_i - \bar{Y}$ | $(X_i - \bar{X})(Y_i - \bar{Y})$ | $(X_i - \bar{X})^2$ | $(Y_i - \bar{Y})^2$ |
|-------|-------|-----------------|-----------------|-------------------------------|-------------------|-------------------|
| 1 | 3 | $1-3=-2$ | $3-4.4=-1.4$ | $(-2)(-1.4)=2.8$ | 4 | 1.96 |
| 2 | 3 | $2-3=-1$ | $3-4.4=-1.4$ | $(-1)(-1.4)=1.4$ | 1 | 1.96 |
| 3 | 5 | $3-3=0$ | $5-4.4=0.6$ | $0 \times 0.6=0$ | 0 | 0.36 |
| 4 | 5 | $4-3=1$ | $5-4.4=0.6$ | $1 \times 0.6=0.6$ | 1 | 0.36 |
| 5 | 6 | $5-3=2$ | $6-4.4=1.6$ | $2 \times 1.6=3.2$ | 4 | 2.56 |
- Sum these values:
$$\sum (X_i - \bar{X})(Y_i - \bar{Y}) = 2.8 + 1.4 + 0 + 0.6 + 3.2 = 8$$
$$\sum (X_i - \bar{X})^2 = 4 + 1 + 0 + 1 + 4 = 10$$
$$\sum (Y_i - \bar{Y})^2 = 1.96 + 1.96 + 0.36 + 0.36 + 2.56 = 7.2$$
- Calculate variances and covariance (using $n=5$):
$$S_{XY} = \frac{8}{5} = 1.6$$
$$S_{XX} = \frac{10}{5} = 2$$
$$S_{YY} = \frac{7.2}{5} = 1.44$$
- Calculate coefficient of correlation:
$$r = \frac{1.6}{\sqrt{2 \times 1.44}} = \frac{1.6}{\sqrt{2.88}} = \frac{1.6}{1.697} \approx 0.9428$$
- Calculate regression coefficients:
$$b = \frac{S_{XY}}{S_{XX}} = \frac{1.6}{2} = 0.8$$
$$a = \bar{Y} - b \bar{X} = 4.4 - 0.8 \times 3 = 4.4 - 2.4 = 2$$
- Regression equation:
$$y = 2 + 0.8x$$
- Reliability (coefficient of determination):
$$r^2 = (0.9428)^2 = 0.889$$
---
4. **Interpretation:**
- The correlation coefficient $r \approx 0.943$ indicates a strong positive linear relationship between advertising expenditure and sales revenue.
- The regression line $y = 2 + 0.8x$ can be used to predict sales revenue from advertising expenditure.
- The reliability $r^2 \approx 0.889$ means about 88.9% of the variation in sales revenue is explained by the advertising expenditure.
---
**Final answers:**
a) Coefficient of correlation $r \approx 0.943$
b) Regression line $y = 2 + 0.8x$
c) Reliability of the model $r^2 \approx 0.889$