Subjects statistics

House Price Regression

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Search Solutions

House Price Regression


1. **Problem Statement:** We want to build a linear regression model to predict house prices based on size (sq ft) and age (years) using the given data. 2. **Model Form:** The multiple linear regression model is: $$\text{Price} = \beta_0 + \beta_1 \times \text{Size} + \beta_2 \times \text{Age} + \epsilon$$ where $\beta_0$ is the intercept, $\beta_1$ and $\beta_2$ are coefficients for size and age respectively, and $\epsilon$ is the error term. 3. **Data:** | Size | Age | Price | |------|-----|-------| | 1500 | 5 | 300 | | 1600 | 10 | 280 | | 1700 | 15 | 260 | | 1800 | 20 | 240 | | 1900 | 25 | 220 | 4. **Step: Calculate means** $$\bar{x}_1 = \frac{1500+1600+1700+1800+1900}{5} = 1700$$ $$\bar{x}_2 = \frac{5+10+15+20+25}{5} = 15$$ $$\bar{y} = \frac{300+280+260+240+220}{5} = 260$$ 5. **Step: Calculate coefficients using least squares formulas** Calculate sums: $$S_{x_1x_1} = \sum (x_1 - \bar{x}_1)^2 = 100000$$ $$S_{x_2x_2} = \sum (x_2 - \bar{x}_2)^2 = 250$$ $$S_{x_1x_2} = \sum (x_1 - \bar{x}_1)(x_2 - \bar{x}_2) = -5000$$ $$S_{x_1y} = \sum (x_1 - \bar{x}_1)(y - \bar{y}) = -50000$$ $$S_{x_2y} = \sum (x_2 - \bar{x}_2)(y - \bar{y}) = -1250$$ 6. **Step: Solve for coefficients $\beta_1$ and $\beta_2$** Using matrix form: $$\begin{bmatrix} S_{x_1x_1} & S_{x_1x_2} \\ S_{x_1x_2} & S_{x_2x_2} \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix} = \begin{bmatrix} S_{x_1y} \\ S_{x_2y} \end{bmatrix}$$ $$\begin{bmatrix} 100000 & -5000 \\ -5000 & 250 \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix} = \begin{bmatrix} -50000 \\ -1250 \end{bmatrix}$$ Calculate determinant: $$D = 100000 \times 250 - (-5000) \times (-5000) = 25000000 - 25000000 = 0$$ Since determinant is zero, the matrix is singular, indicating perfect multicollinearity or linear dependence between size and age in this dataset. 7. **Interpretation:** The size and age variables are perfectly linearly related in this data (age increases by 5 years as size increases by 100 sq ft), so we cannot estimate unique coefficients for both. 8. **Alternative approach:** Use simple linear regression with one variable, for example size: Calculate slope: $$\beta_1 = \frac{S_{x_1y}}{S_{x_1x_1}} = \frac{-50000}{100000} = -0.5$$ Intercept: $$\beta_0 = \bar{y} - \beta_1 \bar{x}_1 = 260 - (-0.5)(1700) = 260 + 850 = 1110$$ 9. **Final model:** $$\text{Price} = 1110 - 0.5 \times \text{Size}$$ 10. **Interpretation of coefficients:** - Intercept $1110$: predicted price when size is zero (not meaningful practically but part of the model). - Slope $-0.5$: for each additional square foot, price decreases by 0.5 (in $1000s), which is counterintuitive and likely due to the confounding effect of age. **Summary:** Due to perfect linear dependence between size and age, a multiple regression model cannot be fit uniquely. A simple regression on size shows a negative relationship, but this is likely misleading without considering age separately.