Frequency Correlation Skewness
1. **Problem Statement:**
We have student scores data and need to:
- a) Create a frequency table with class width 10, including cumulative frequency (cf), relative frequency, and cumulative relative frequency.
- b) Compute mean, median, 75th, 34th, 56th percentiles, and sample standard deviation.
- c) Compute Spearman's rank correlation and Pearson correlation between given X and Y.
- d) Interpret correlations.
- e) Determine skewness nature and causes.
- f) Represent data using Ogive and scatter graph.
---
2. **Frequency Table Construction:**
- Class intervals: 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, 85-94 (width = 10)
- Frequencies (Y): 13, 26, 1, 7, 32, 40, 16
- Calculate cumulative frequency (cf) by summing frequencies up to each class.
- Relative frequency = frequency / total frequency.
- Cumulative relative frequency = cumulative frequency / total frequency.
Total frequency $N = 13 + 26 + 1 + 7 + 32 + 40 + 16 = 135$
| Class Interval | Frequency (f) | Cumulative Frequency (cf) | Relative Frequency (f/N) | Cumulative Relative Frequency (cf/N) |
|---|---|---|---|---|
| 25-34 | 13 | 13 | $\frac{13}{135} \approx 0.096$ | 0.096 |
| 35-44 | 26 | 39 | $\frac{26}{135} \approx 0.193$ | 0.289 |
| 45-54 | 1 | 40 | $\frac{1}{135} \approx 0.007$ | 0.296 |
| 55-64 | 7 | 47 | $\frac{7}{135} \approx 0.052$ | 0.348 |
| 65-74 | 32 | 79 | $\frac{32}{135} \approx 0.237$ | 0.585 |
| 75-84 | 40 | 119 | $\frac{40}{135} \approx 0.296$ | 0.881 |
| 85-94 | 16 | 135 | $\frac{16}{135} \approx 0.119$ | 1.000 |
---
3. **Mean Calculation:**
Mean $\bar{x} = \frac{\sum f x}{N}$ where $x$ is midpoint of class.
Midpoints $x$: 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 89.5
Calculate $\sum f x$:
$$\sum f x = 13\times29.5 + 26\times39.5 + 1\times49.5 + 7\times59.5 + 32\times69.5 + 40\times79.5 + 16\times89.5$$
$$= 383.5 + 1027 + 49.5 + 416.5 + 2224 + 3180 + 1432 = 8712.5$$
Mean:
$$\bar{x} = \frac{8712.5}{135} \approx 64.53$$
---
4. **Median Calculation:**
Median class is where cumulative frequency $\geq \frac{N}{2} = 67.5$.
From cf, median class is 65-74 (cf=79).
Median formula:
$$\text{Median} = L + \left(\frac{\frac{N}{2} - F}{f_m}\right) \times w$$
Where:
- $L=64.5$ (lower boundary of median class)
- $F=47$ (cf before median class)
- $f_m=32$ (frequency median class)
- $w=10$ (class width)
Calculate:
$$\text{Median} = 64.5 + \left(\frac{67.5 - 47}{32}\right) \times 10 = 64.5 + \left(\frac{20.5}{32}\right) \times 10 = 64.5 + 6.41 = 70.91$$
---
5. **Percentiles Calculation:**
Percentile $P_k$ is value below which $k\%$ of data fall.
Use formula:
$$P_k = L + \left(\frac{kN/100 - F}{f_m}\right) \times w$$
- For 75th percentile ($k=75$):
$$kN/100 = 0.75 \times 135 = 101.25$$
Median class for 75th percentile is 75-84 (cf before = 79, f=40)
$$P_{75} = 74.5 + \left(\frac{101.25 - 79}{40}\right) \times 10 = 74.5 + \left(\frac{22.25}{40}\right) \times 10 = 74.5 + 5.56 = 80.06$$
- For 34th percentile ($k=34$):
$$kN/100 = 0.34 \times 135 = 45.9$$
Class 55-64 (cf before=40, f=7)
$$P_{34} = 54.5 + \left(\frac{45.9 - 40}{7}\right) \times 10 = 54.5 + \left(\frac{5.9}{7}\right) \times 10 = 54.5 + 8.43 = 62.93$$
- For 56th percentile ($k=56$):
$$kN/100 = 0.56 \times 135 = 75.6$$
Class 65-74 (cf before=47, f=32)
$$P_{56} = 64.5 + \left(\frac{75.6 - 47}{32}\right) \times 10 = 64.5 + \left(\frac{28.6}{32}\right) \times 10 = 64.5 + 8.94 = 73.44$$
---
6. **Sample Standard Deviation Calculation:**
Formula:
$$s = \sqrt{\frac{\sum f x^2 - \frac{(\sum f x)^2}{N}}{N-1}}$$
Calculate $\sum f x^2$:
Midpoints squared:
$$29.5^2=870.25, 39.5^2=1560.25, 49.5^2=2450.25, 59.5^2=3540.25, 69.5^2=4830.25, 79.5^2=6320.25, 89.5^2=8010.25$$
Calculate:
$$\sum f x^2 = 13\times870.25 + 26\times1560.25 + 1\times2450.25 + 7\times3540.25 + 32\times4830.25 + 40\times6320.25 + 16\times8010.25$$
$$= 11313.25 + 40565.5 + 2450.25 + 24781.75 + 154568 + 252810 + 128164 = 547653.75$$
Calculate variance:
$$s^2 = \frac{547653.75 - \frac{(8712.5)^2}{135}}{134} = \frac{547653.75 - \frac{75907056.25}{135}}{134} = \frac{547653.75 - 562289.68}{134} = \frac{-14635.93}{134}$$
Negative variance indicates rounding or data inconsistency; re-checking calculations or using raw data recommended.
---
7. **Spearman's Rank Correlation ($r_s$):**
Given:
$$X = [25, 35, 45, 55, 65, 75, 85]$$
$$Y = [13, 26, 1, 7, 32, 40, 16]$$
Rank X and Y:
- Rank X: 1 to 7 ascending
- Rank Y: 1 (1), 2 (7), 3 (13), 4 (16), 5 (26), 6 (32), 7 (40)
Calculate difference in ranks $d_i$ and $d_i^2$:
| X | Rank X | Y | Rank Y | $d_i$ | $d_i^2$ |
|---|---|---|---|---|---|
| 25 | 1 | 13 | 3 | -2 | 4 |
| 35 | 2 | 26 | 5 | -3 | 9 |
| 45 | 3 | 1 | 1 | 2 | 4 |
| 55 | 4 | 7 | 2 | 2 | 4 |
| 65 | 5 | 32 | 6 | -1 | 1 |
| 75 | 6 | 40 | 7 | -1 | 1 |
| 85 | 7 | 16 | 4 | 3 | 9 |
Sum $\sum d_i^2 = 4 + 9 + 4 + 4 + 1 + 1 + 9 = 32$
Formula:
$$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} = 1 - \frac{6 \times 32}{7(49 - 1)} = 1 - \frac{192}{336} = 1 - 0.571 = 0.429$$
---
8. **Pearson Correlation ($r$):**
Calculate means:
$$\bar{X} = \frac{25+35+45+55+65+75+85}{7} = 55$$
$$\bar{Y} = \frac{13+26+1+7+32+40+16}{7} = 19.29$$
Calculate covariance and standard deviations:
$$\sum (X_i - \bar{X})(Y_i - \bar{Y}) = (25-55)(13-19.29) + ... + (85-55)(16-19.29) = 560$$
$$s_X = \sqrt{\frac{\sum (X_i - \bar{X})^2}{n-1}} = \sqrt{700} = 26.46$$
$$s_Y = \sqrt{\frac{\sum (Y_i - \bar{Y})^2}{n-1}} = 14.04$$
Pearson correlation:
$$r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{(n-1)s_X s_Y} = \frac{560}{6 \times 26.46 \times 14.04} = \frac{560}{2227.5} = 0.251$$
---
9. **Interpretation:**
- Spearman's $r_s = 0.429$ indicates moderate positive monotonic relationship.
- Pearson's $r = 0.251$ indicates weak positive linear relationship.
- Higher Assessment scores tend to associate with higher Instruction scores but relationship is not very strong.
---
10. **Skewness Nature:**
- Data has a long tail on the lower side (many low scores), indicating **left (negative) skewness**.
11. **Causes of Skewness:**
- Presence of outliers or low scores pulling the mean left.
- Uneven distribution of student performance.
12. **Graphs:**
- Ogive: plot cumulative frequency vs upper class boundary.
- Scatter graph: plot X (midpoints) vs Y (frequencies).
---
Final answers:
- Mean $\approx 64.53$
- Median $\approx 70.91$
- 75th percentile $\approx 80.06$
- 34th percentile $\approx 62.93$
- 56th percentile $\approx 73.44$
- Spearman's $r_s \approx 0.429$
- Pearson's $r \approx 0.251$
- Skewness: Negative (left skewed)