Statistics Problems
1. **Prove the rank correlation coefficient formula:**
The rank correlation coefficient $R$ is given by:
$$R = 1 - \frac{6\sum d^2}{n(n^2 - 1)}$$
where $d$ is the difference between ranks of paired observations and $n$ is the number of pairs.
**Proof:**
- Assign ranks to two variables $X$ and $Y$.
- Let $d_i = R(X_i) - R(Y_i)$ be the difference in ranks for the $i^{th}$ pair.
- The sum of squared differences is $\sum d_i^2$.
- The formula is derived from the Pearson correlation formula applied to ranks.
2. **Five properties of a good estimator:**
1. **Unbiasedness:** The expected value of the estimator equals the true parameter.
2. **Consistency:** The estimator converges to the true parameter as sample size increases.
3. **Efficiency:** Among unbiased estimators, it has the smallest variance.
4. **Sufficiency:** It uses all the information in the data about the parameter.
5. **Robustness:** It is not unduly affected by small deviations from model assumptions.
3. **Test if machines A and B produce statistically different stationeries:**
- Data for A: 20,15,6,8,18,10,6,5 (n=8)
- Data for B: 10,5,2,6,8,15,20,5,8,10 (n=10)
Use two-sample t-test:
- Calculate means and variances.
- Compute test statistic:
$$t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}$$
- Compare with critical t-values at $\alpha=0.01$ and $0.05$.
4. **Variance of simple linear regression model:**
Given data:
- Consumption: 8,7,6,5,12,20,8,7,6,5
- Income: 9,10,8,6,15,21,9,10,7,6
Steps:
- Compute means $\bar{x}$ and $\bar{y}$.
- Compute sums of squares:
$$SS_{xx} = \sum (x_i - \bar{x})^2, \quad SS_{yy} = \sum (y_i - \bar{y})^2, \quad SS_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})$$
- Estimate slope $b = \frac{SS_{xy}}{SS_{xx}}$ and intercept $a = \bar{y} - b\bar{x}$.
- Calculate residual variance:
$$s^2 = \frac{1}{n-2} \sum (y_i - a - bx_i)^2$$
5. **Variance of sample distributions for sampling with and without replacement:**
- Population $S = \{1,2,3,4,5\}$, population size $N=5$, sample size $n=2$.
- Population variance:
$$\sigma^2 = \frac{1}{N} \sum (x_i - \bar{x})^2$$
- Sampling with replacement variance of sample mean:
$$Var(\bar{X}) = \frac{\sigma^2}{n}$$
- Sampling without replacement variance:
$$Var(\bar{X}) = \frac{\sigma^2}{n} \times \frac{N-n}{N-1}$$
Interpretation: Without replacement, variance is smaller due to finite population correction.
6(a). **Hypothesis test for spot remover:**
- Null hypothesis $H_0: p=0.7$, alternative $H_a: p>0.7$.
- Reject $H_0$ if number of spots removed $\geq 6$.
I. **Type I error probability:**
$$\alpha = P(\text{reject } H_0 | p=0.7) = P(X \geq 6) = 1 - P(X \leq 5)$$
where $X \sim Binomial(n=10, p=0.7)$.
II. **Power when $p=0.5$:**
$$Power = P(\text{reject } H_0 | p=0.5) = P(X \geq 6)$$
where $X \sim Binomial(10, 0.5)$.
6(b). **Test if newborn weights differ from a hypothesized mean:**
- Data: 4.0, 3.7, 2.5, 4.1, 6.7, 6.0, 2.8, 5.0, 2.35
- Use one-sample t-test:
$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$
- Compute confidence intervals at 5% and 1%:
$$\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}$$
where $\bar{x}$ is sample mean, $s$ sample standard deviation, $n$ sample size, and $\mu_0$ hypothesized mean.
---
Final answers are numerical or formulaic as above for each question.