Statistical Analysis F9Ab25
1. **Problem 1: Statistical Measures for Dataset 1**
Given dataset: 45, 62, 75, 85, 90, 93, 95, 95, 100
2. **Range** is the difference between the maximum and minimum values:
$$\text{Range} = 100 - 45 = 55$$
3. **Quartiles and Percentiles:**
- Sort the data (already sorted).
- The 1st quartile (Q1) is the 25th percentile, the 3rd quartile (Q3) is the 75th percentile.
- To find Q1 (25th percentile), locate the position: $$P = \frac{25}{100} \times (n+1) = 0.25 \times 10 = 2.5$$
- Q1 is the average of the 2nd and 3rd values: $$\frac{62 + 75}{2} = 68.5$$
- To find Q3 (75th percentile), position: $$0.75 \times 10 = 7.5$$
- Q3 is average of 7th and 8th values: $$\frac{95 + 95}{2} = 95$$
- The 50th percentile is the median, position: $$0.5 \times 10 = 5$$
- Median is the 5th value: 90
4. **Mean** (average):
$$\text{Mean} = \frac{45 + 62 + 75 + 85 + 90 + 93 + 95 + 95 + 100}{9} = \frac{740}{9} \approx 82.22$$
5. **Median** is the middle value in sorted data: 90
6. **Mode** is the most frequent value: 95 (appears twice)
7. **Variance** measures spread:
- Calculate each deviation squared:
$$\sum (x_i - \bar{x})^2 = (45-82.22)^2 + (62-82.22)^2 + ... + (100-82.22)^2$$
- Calculated sum of squares: approximately 3346.22
- Variance: $$\frac{3346.22}{9} \approx 371.80$$
8. **Standard Deviation** is the square root of variance:
$$\sqrt{371.80} \approx 19.29$$
9. **Comments on equality:**
- The 25th percentile equals the 1st quartile by definition.
- The 50th percentile equals the median, which is expected.
- The 3rd quartile (Q3) is the 75th percentile, but here we only calculated Q3; the 75th percentile would be the same as Q3.
- The equality between 1st quartile and 25th percentile, and median and 50th percentile, is due to their definitions.
10. **Problem 2: Data Visualization for Dataset 2**
Given dataset: 72, 88, 94, 119, 85, 91, 77, 84, 75, 79, 83, 80, 87, 70, 76, 82, 93, 78, 89, 95, 86, 90, 73, 92, 81
11. **Stem-and-Leaf Plot:**
- Stems represent tens, leaves represent units.
- Example:
70 | 0 3 5 6 7 8 9
80 | 0 1 2 3 4 5 6 7 8 9
90 | 1 2 3 4 5
100 |
110 | 9
12. **Histogram:**
- Group data into bins (e.g., 70-79, 80-89, 90-99, 100-109, 110-119).
- Count frequencies:
70-79: 7
80-89: 10
90-99: 6
100-109: 0
110-119: 1
13. **Box Plot:**
- Calculate quartiles:
Sorted data: 70, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 119
- Median (Q2): 85
- Q1: median of lower half (70 to 84): 77
- Q3: median of upper half (86 to 119): 91
- Identify outliers using 1.5*IQR:
IQR = Q3 - Q1 = 91 - 77 = 14
Lower bound = 77 - 1.5*14 = 56
Upper bound = 91 + 1.5*14 = 112
- Outlier: 119 (above upper bound)
14. **Labels and Titles:**
- Stem-and-leaf: "Stem-and-Leaf Plot of Test Scores"
- Histogram: "Histogram of Test Scores with Frequency"
- Box plot: "Box Plot of Test Scores with Outlier"
15. **Summary:**
- The stem-and-leaf plot shows distribution and individual values.
- The histogram visualizes frequency distribution.
- The box plot summarizes spread, center, and outliers.
- 119 is a potential outlier in the dataset.