Linear Regression Correlation
1. **Problem Statement:**
We have two problems involving paired data and linear regression.
**Problem 2:** Given casino size and revenue data, find the linear regression equation and predict revenue for a casino size of 200 thousand square feet.
**Problem 3:** Given time and height data of a soccer ball, find the linear correlation coefficient $r$, interpret it, and discuss potential mistakes without a scatterplot.
---
2. **Linear Regression Equation (Problem 2a):**
The linear regression equation is given by:
$$y = mx + b$$
where $m$ is the slope and $b$ is the y-intercept.
To find $m$ and $b$, use:
$$m = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}$$
$$b = \frac{\sum y - m \sum x}{n}$$
*Note:* Since the paired data from the preceding exercise is not provided here, we cannot compute exact values. However, once $m$ and $b$ are found, the equation is complete.
---
3. **Prediction for Casino Size 200 (Problem 2b):**
Use the regression equation:
$$y = m(200) + b$$
This gives the predicted revenue.
*Is it likely to be accurate?* Prediction accuracy depends on whether 200 thousand square feet is within the range of the original data (interpolation) or outside (extrapolation). Extrapolation is less reliable.
---
4. **Linear Correlation Coefficient $r$ (Problem 3a):**
The formula for $r$ is:
$$r = \frac{n\sum xy - \sum x \sum y}{\sqrt{(n\sum x^2 - (\sum x)^2)(n\sum y^2 - (\sum y)^2)}}$$
Given:
Time $x$: 0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8
Height $y$: 0.0, 1.7, 3.1, 3.9, 4.5, 4.7, 4.6, 4.1, 3.3, 2.1
Calculate sums:
$$\sum x = 9.0$$
$$\sum y = 28.9$$
$$\sum x^2 = 11.4$$
$$\sum y^2 = 108.15$$
$$\sum xy = 24.68$$
$$n = 10$$
Calculate numerator:
$$10 \times 24.68 - 9.0 \times 28.9 = 246.8 - 260.1 = -13.3$$
Calculate denominator:
$$\sqrt{(10 \times 11.4 - 9.0^2)(10 \times 108.15 - 28.9^2)} = \sqrt{(114 - 81)(1081.5 - 835.21)} = \sqrt{33 \times 246.29} = \sqrt{8127.57} \approx 90.18$$
Therefore:
$$r = \frac{-13.3}{90.18} \approx -0.1475$$
---
5. **Interpretation of $r$ (Problem 3b):**
An $r$ value of approximately $-0.15$ indicates a very weak negative linear correlation between time and height.
---
6. **Potential Mistake Without Scatterplot (Problem 3c):**
Without a scatterplot, one might incorrectly assume a linear relationship exists. The data shows a parabolic pattern (height increases then decreases), so linear regression is not appropriate. This could lead to misleading conclusions.
---
**Final answers:**
- Problem 2a: Linear regression equation $y = mx + b$ (values depend on data).
- Problem 2b: Predicted revenue at $x=200$ is $y = m(200) + b$; accuracy depends on data range.
- Problem 3a: $r \approx -0.15$.
- Problem 3b: Very weak negative linear correlation.
- Problem 3c: Mistake is assuming linearity without scatterplot.