Cluster Variance
1. **Problem Statement:** We want to calculate the variance $v_c^2$ for clusters given the formula:
$$ v_c^2 = \frac{1}{n_c - 1} \sum_{i=1}^{n_c} \left( d_i - \overline{d} \right)^2 $$
where $n_c$ is the number of data points in cluster $c$, $d_i$ are the data points, and $\overline{d}$ is the mean of the data points in the cluster.
2. **Dataset and Clustering:** Given points: $(1,1), (3,1), (1,3), (3,3), (7,7), (9,9)$.
Assume two clusters are formed:
- Cluster 1: $(1,1), (3,1), (1,3), (3,3)$
- Cluster 2: $(7,7), (9,9)$
3. **Calculate Mean of Each Cluster:**
- For Cluster 1:
\[ \overline{x} = \frac{1+3+1+3}{4} = 2, \quad \overline{y} = \frac{1+1+3+3}{4} = 2 \]
- For Cluster 2:
\[ \overline{x} = \frac{7+9}{2} = 8, \quad \overline{y} = \frac{7+9}{2} = 8 \]
4. **Calculate Variance for Each Cluster:**
Variance is calculated separately per dimension and then summed, so for Cluster 1:
\[ v_1^2 = \frac{1}{4-1} \sum_{i=1}^4 \left((x_i - 2)^2 + (y_i - 2)^2\right) \]
Calculations for each point in Cluster 1:
- $(1,1): (1-2)^2 + (1-2)^2 = 1 + 1 = 2$
- $(3,1): (3-2)^2 + (1-2)^2 = 1 + 1 = 2$
- $(1,3): (1-2)^2 + (3-2)^2 = 1 + 1 = 2$
- $(3,3): (3-2)^2 + (3-2)^2 = 1 + 1 = 2$
Sum = 2+2+2+2=8
\[ v_1^2 = \frac{8}{3} = 2.6667 \]
For Cluster 2:
\[ v_2^2 = \frac{1}{2-1} \sum_{i=1}^2 \left((x_i - 8)^2 + (y_i - 8)^2\right) \]
Calculations:
- $(7,7): (7-8)^2 + (7-8)^2 = 1 + 1 = 2$
- $(9,9): (9-8)^2 + (9-8)^2 = 1 + 1 = 2$
Sum = 2 + 2 = 4
\[ v_2^2 = \frac{4}{1} = 4 \]
5. **Final Answer:**
- Variance of Cluster 1: $v_1^2 = 2.67$
- Variance of Cluster 2: $v_2^2 = 4$
Thus, you now know how to compute cluster variance using the given dataset.