Statistics Problems

1. **Prove the rank correlation coefficient formula:** The rank correlation coefficient $R$ is given by: $$R = 1 - \frac{6\sum d^2}{n(n^2 - 1)}$$ where $d$ is the difference between ranks of paired observations and $n$ is the number of pairs. **Proof:** - Assign ranks to two variables $X$ and $Y$. - Let $d_i = R(X_i) - R(Y_i)$ be the difference in ranks for the $i^{th}$ pair. - The sum of squared differences is $\sum d_i^2$. - The formula is derived from the Pearson correlation formula applied to ranks. 2. **Five properties of a good estimator:** 1. **Unbiasedness:** The expected value of the estimator equals the true parameter. 2. **Consistency:** The estimator converges to the true parameter as sample size increases. 3. **Efficiency:** Among unbiased estimators, it has the smallest variance. 4. **Sufficiency:** It uses all the information in the data about the parameter. 5. **Robustness:** It is not unduly affected by small deviations from model assumptions. 3. **Test if machines A and B produce statistically different stationeries:** - Data for A: 20,15,6,8,18,10,6,5 (n=8) - Data for B: 10,5,2,6,8,15,20,5,8,10 (n=10) Use two-sample t-test: - Calculate means and variances. - Compute test statistic: $$t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}$$ - Compare with critical t-values at $\alpha=0.01$ and $0.05$. 4. **Variance of simple linear regression model:** Given data: - Consumption: 8,7,6,5,12,20,8,7,6,5 - Income: 9,10,8,6,15,21,9,10,7,6 Steps: - Compute means $\bar{x}$ and $\bar{y}$. - Compute sums of squares: $$SS_{xx} = \sum (x_i - \bar{x})^2, \quad SS_{yy} = \sum (y_i - \bar{y})^2, \quad SS_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})$$ - Estimate slope $b = \frac{SS_{xy}}{SS_{xx}}$ and intercept $a = \bar{y} - b\bar{x}$. - Calculate residual variance: $$s^2 = \frac{1}{n-2} \sum (y_i - a - bx_i)^2$$ 5. **Variance of sample distributions for sampling with and without replacement:** - Population $S = \{1,2,3,4,5\}$, population size $N=5$, sample size $n=2$. - Population variance: $$\sigma^2 = \frac{1}{N} \sum (x_i - \bar{x})^2$$ - Sampling with replacement variance of sample mean: $$Var(\bar{X}) = \frac{\sigma^2}{n}$$ - Sampling without replacement variance: $$Var(\bar{X}) = \frac{\sigma^2}{n} \times \frac{N-n}{N-1}$$ Interpretation: Without replacement, variance is smaller due to finite population correction. 6(a). **Hypothesis test for spot remover:** - Null hypothesis $H_0: p=0.7$, alternative $H_a: p>0.7$. - Reject $H_0$ if number of spots removed $\geq 6$. I. **Type I error probability:** $$\alpha = P(\text{reject } H_0 | p=0.7) = P(X \geq 6) = 1 - P(X \leq 5)$$ where $X \sim Binomial(n=10, p=0.7)$. II. **Power when $p=0.5$:** $$Power = P(\text{reject } H_0 | p=0.5) = P(X \geq 6)$$ where $X \sim Binomial(10, 0.5)$. 6(b). **Test if newborn weights differ from a hypothesized mean:** - Data: 4.0, 3.7, 2.5, 4.1, 6.7, 6.0, 2.8, 5.0, 2.35 - Use one-sample t-test: $$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$ - Compute confidence intervals at 5% and 1%: $$\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}$$ where $\bar{x}$ is sample mean, $s$ sample standard deviation, $n$ sample size, and $\mu_0$ hypothesized mean. --- Final answers are numerical or formulaic as above for each question.