Rank Correlation Properties

1. **Prove the rank correlation coefficient formula:** The rank correlation coefficient $R$ is given by: $$R = 1 - \frac{6\sum d^2}{n(n^2 - 1)}$$ where $d$ is the difference between ranks of paired observations and $n$ is the number of pairs. **Proof:** - Assign ranks to two variables $X$ and $Y$. - Let $d_i = R(X_i) - R(Y_i)$ be the difference in ranks for the $i^{th}$ pair. - The Spearman rank correlation coefficient is defined as the Pearson correlation coefficient between the ranks. - Using algebraic manipulation and properties of ranks, it can be shown that: $$R = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}$$ 2. **Five properties of a good estimator:** 1. **Unbiasedness:** The expected value of the estimator equals the true parameter. 2. **Consistency:** The estimator converges in probability to the true parameter as sample size increases. 3. **Efficiency:** Among unbiased estimators, it has the smallest variance. 4. **Sufficiency:** It uses all the information in the data relevant to the parameter. 5. **Robustness:** It is not unduly affected by small deviations from model assumptions. 3. **Test if stationeries produced by Machines A and B differ significantly:** - Data Machine A: 20,15,6,8,18,10,6,5 (n=8) - Data Machine B: 10,5,2,6,8,15,20,5,8,10 (n=10) **Step 1:** State hypotheses: - $H_0$: Means are equal - $H_a$: Means differ **Step 2:** Calculate sample means and variances. **Step 3:** Use two-sample t-test (assuming unequal variances). **Step 4:** Calculate test statistic and compare with critical t-values at $\alpha=0.01$ and $0.05$. **Step 5:** Conclude if difference is statistically significant. 4. **Variance of simple linear regression model:** - Data: - Consumption: 8,7,6,5,12,20,8,7,6,5 - Income: 9,10,8,6,15,21,9,10,7,6 **Step 1:** Calculate means $\bar{x}$ and $\bar{y}$. **Step 2:** Calculate sums of squares: $$SS_{xx} = \sum (x_i - \bar{x})^2$$ $$SS_{yy} = \sum (y_i - \bar{y})^2$$ $$SS_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})$$ **Step 3:** Calculate regression coefficients: $$b = \frac{SS_{xy}}{SS_{xx}}, \quad a = \bar{y} - b\bar{x}$$ **Step 4:** Calculate residual variance: $$s^2 = \frac{1}{n-2} \sum (y_i - a - bx_i)^2$$ 5. **Variance of sample distributions for sampling with and without replacement:** - Population $S = \{1,2,3,4,5\}$, $N=5$, sample size $n=2$. **Step 1:** Calculate population mean $\mu$ and variance $\sigma^2$. **Step 2:** For sampling with replacement, variance of sample mean: $$Var(\bar{X}) = \frac{\sigma^2}{n}$$ **Step 3:** For sampling without replacement, variance of sample mean: $$Var(\bar{X}) = \frac{\sigma^2}{n} \times \frac{N-n}{N-1}$$ **Interpretation:** Sampling without replacement reduces variance due to finite population correction. 6(a). **Hypothesis test for spot remover effectiveness:** - Null hypothesis $H_0: P=0.7$ - Reject $H_0$ if fewer than 6 spots removed out of 10. I. **Type I error probability:** $$\alpha = P(\text{reject } H_0 | P=0.7) = P(X < 6) = \sum_{k=0}^5 \binom{10}{k} 0.7^k 0.3^{10-k}$$ II. **Power of test when $P=0.5$:** $$\text{Power} = P(\text{reject } H_0 | P=0.5) = P(X < 6) = \sum_{k=0}^5 \binom{10}{k} 0.5^k 0.5^{10-k}$$ 6(b). **Test if newborn weights differ from a hypothesized mean:** - Data: 4.0, 3.7, 2.5, 4.1, 6.7, 6.0, 2.8, 5.0, 2.35 **Step 1:** State hypotheses: - $H_0$: Mean weight equals hypothesized value (e.g., population mean) - $H_a$: Mean weight differs **Step 2:** Calculate sample mean $\bar{x}$ and sample standard deviation $s$. **Step 3:** Use t-test statistic: $$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$ **Step 4:** Compare $t$ with critical values at $\alpha=0.05$ and $0.01$. **Step 5:** Compute confidence intervals: $$\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}$$ **Step 6:** Conclude if weights differ significantly.