Rank Correlation Properties
1. **Prove the rank correlation coefficient formula:**
The rank correlation coefficient $R$ is given by:
$$R = 1 - \frac{6\sum d^2}{n(n^2 - 1)}$$
where $d$ is the difference between ranks of paired observations and $n$ is the number of pairs.
**Proof:**
- Assign ranks to two variables $X$ and $Y$.
- Let $d_i = R(X_i) - R(Y_i)$ be the difference in ranks for the $i^{th}$ pair.
- The Spearman rank correlation coefficient is defined as the Pearson correlation coefficient between the ranks.
- Using algebraic manipulation and properties of ranks, it can be shown that:
$$R = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}$$
2. **Five properties of a good estimator:**
1. **Unbiasedness:** The expected value of the estimator equals the true parameter.
2. **Consistency:** The estimator converges in probability to the true parameter as sample size increases.
3. **Efficiency:** Among unbiased estimators, it has the smallest variance.
4. **Sufficiency:** It uses all the information in the data relevant to the parameter.
5. **Robustness:** It is not unduly affected by small deviations from model assumptions.
3. **Test if stationeries produced by Machines A and B differ significantly:**
- Data Machine A: 20,15,6,8,18,10,6,5 (n=8)
- Data Machine B: 10,5,2,6,8,15,20,5,8,10 (n=10)
**Step 1:** State hypotheses:
- $H_0$: Means are equal
- $H_a$: Means differ
**Step 2:** Calculate sample means and variances.
**Step 3:** Use two-sample t-test (assuming unequal variances).
**Step 4:** Calculate test statistic and compare with critical t-values at $\alpha=0.01$ and $0.05$.
**Step 5:** Conclude if difference is statistically significant.
4. **Variance of simple linear regression model:**
- Data:
- Consumption: 8,7,6,5,12,20,8,7,6,5
- Income: 9,10,8,6,15,21,9,10,7,6
**Step 1:** Calculate means $\bar{x}$ and $\bar{y}$.
**Step 2:** Calculate sums of squares:
$$SS_{xx} = \sum (x_i - \bar{x})^2$$
$$SS_{yy} = \sum (y_i - \bar{y})^2$$
$$SS_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})$$
**Step 3:** Calculate regression coefficients:
$$b = \frac{SS_{xy}}{SS_{xx}}, \quad a = \bar{y} - b\bar{x}$$
**Step 4:** Calculate residual variance:
$$s^2 = \frac{1}{n-2} \sum (y_i - a - bx_i)^2$$
5. **Variance of sample distributions for sampling with and without replacement:**
- Population $S = \{1,2,3,4,5\}$, $N=5$, sample size $n=2$.
**Step 1:** Calculate population mean $\mu$ and variance $\sigma^2$.
**Step 2:** For sampling with replacement, variance of sample mean:
$$Var(\bar{X}) = \frac{\sigma^2}{n}$$
**Step 3:** For sampling without replacement, variance of sample mean:
$$Var(\bar{X}) = \frac{\sigma^2}{n} \times \frac{N-n}{N-1}$$
**Interpretation:** Sampling without replacement reduces variance due to finite population correction.
6(a). **Hypothesis test for spot remover effectiveness:**
- Null hypothesis $H_0: P=0.7$
- Reject $H_0$ if fewer than 6 spots removed out of 10.
I. **Type I error probability:**
$$\alpha = P(\text{reject } H_0 | P=0.7) = P(X < 6) = \sum_{k=0}^5 \binom{10}{k} 0.7^k 0.3^{10-k}$$
II. **Power of test when $P=0.5$:**
$$\text{Power} = P(\text{reject } H_0 | P=0.5) = P(X < 6) = \sum_{k=0}^5 \binom{10}{k} 0.5^k 0.5^{10-k}$$
6(b). **Test if newborn weights differ from a hypothesized mean:**
- Data: 4.0, 3.7, 2.5, 4.1, 6.7, 6.0, 2.8, 5.0, 2.35
**Step 1:** State hypotheses:
- $H_0$: Mean weight equals hypothesized value (e.g., population mean)
- $H_a$: Mean weight differs
**Step 2:** Calculate sample mean $\bar{x}$ and sample standard deviation $s$.
**Step 3:** Use t-test statistic:
$$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$
**Step 4:** Compare $t$ with critical values at $\alpha=0.05$ and $0.01$.
**Step 5:** Compute confidence intervals:
$$\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}$$
**Step 6:** Conclude if weights differ significantly.