Sales Data Analysis 934803
1. **Problem Statement:** We have sales data for the soap section (in thousands) given as:
$$102.2, 104.2, 100.4, 102.6, 111.7, 121.8, 127.9, 135.3, 137.8, 141.8, 147.9, 151.5, 155.7, 161.4, 162.8, 169.9, 171.7, 178.6, 181.3, 196.8, 111.7, 111.7$$
We need to:
a) Show the box plot (five number summary) of the cleaned data.
b) Determine all measures of central tendency and comment on the best measure.
c) Determine the predicted value of $X$.
2. **Step a: Five Number Summary (Box Plot)**
- First, sort the data:
$$100.4, 102.2, 102.6, 104.2, 111.7, 111.7, 111.7, 121.8, 127.9, 135.3, 137.8, 141.8, 147.9, 151.5, 155.7, 161.4, 162.8, 169.9, 171.7, 178.6, 181.3, 196.8$$
- Minimum ($Q_0$): $100.4$
- Maximum ($Q_4$): $196.8$
- Median ($Q_2$): Since there are 22 data points, median is average of 11th and 12th values:
$$\frac{137.8 + 141.8}{2} = \frac{279.6}{2} = 139.8$$
- Lower quartile ($Q_1$): Median of first 11 values:
Values: $100.4, 102.2, 102.6, 104.2, 111.7, 111.7, 111.7, 121.8, 127.9, 135.3, 137.8$
Median is 6th value: $111.7$
- Upper quartile ($Q_3$): Median of last 11 values:
Values: $141.8, 147.9, 151.5, 155.7, 161.4, 162.8, 169.9, 171.7, 178.6, 181.3, 196.8$
Median is 6th value: $162.8$
**Five number summary:**
$$\text{Min} = 100.4, Q_1 = 111.7, Q_2 = 139.8, Q_3 = 162.8, \text{Max} = 196.8$$
3. **Step b: Measures of Central Tendency**
- **Mean:**
$$\text{Mean} = \frac{\sum X_i}{n}$$
Calculate sum:
$$\sum X_i = 102.2 + 104.2 + 100.4 + 102.6 + 111.7 + 121.8 + 127.9 + 135.3 + 137.8 + 141.8 + 147.9 + 151.5 + 155.7 + 161.4 + 162.8 + 169.9 + 171.7 + 178.6 + 181.3 + 196.8 + 111.7 + 111.7 = 3014.3$$
Number of data points $n=22$
$$\text{Mean} = \frac{3014.3}{22} \approx 137.0$$
- **Median:** Already found as $139.8$
- **Mode:** The value that appears most frequently is $111.7$ (appears 3 times)
**Comment:**
- The mean is sensitive to extreme values (like 196.8).
- The median is robust to outliers and represents the middle value.
- The mode shows the most frequent sales value.
Given the data has some repeated values and a high maximum, the **median** is a better measure of central tendency here.
4. **Step c: Predicted Value of $X$**
- The predicted value of $X$ for future sales is best estimated by the measure of central tendency that represents the data well.
- Since median is robust and better here, the predicted value is:
$$\boxed{139.8}$$
---
**Summary:**
- Five number summary: Min=100.4, Q1=111.7, Median=139.8, Q3=162.8, Max=196.8
- Mean=137.0, Median=139.8, Mode=111.7
- Best measure: Median
- Predicted value of $X$: 139.8