Medtrace Algorithms 8764F0
1. **Problem a: Time Complexity Analysis of CleanAndNormalize**
The pseudo-code has nested loops: outer loop runs $n$ times, inner loop runs from $i$ to $n$, averaging about $\frac{n+1}{2}$ iterations.
The total iterations for merging are approximately $$\sum_{i=1}^n (n - i + 1) = \frac{n(n+1)}{2} = O(n^2).$$
Sorting step uses efficient sort like merge sort with complexity $O(n \log n)$.
Total complexity is $$O(n^2) + O(n \log n) = O(n^2).$$
For $n = 10^6$, $n^2 = 10^{12}$ operations, which is infeasible for real-time processing on limited hardware.
**Recommendation:** Use hashing to merge in $O(n)$, then sort in $O(n \log n)$ for better scalability.
2. **Problem b: Solve recurrence $T(n) = 4T(n/2) + n \log n$**
Using Master Theorem:
- $a=4$, $b=2$, $f(n) = n \log n$
- Compare $n^{\log_b a} = n^{\log_2 4} = n^2$ with $f(n) = n \log n$
Since $f(n)$ is polynomially smaller than $n^2$, case 1 applies:
$$T(n) = \Theta(n^2).$$
**Implication:** The divide-and-conquer approach with 4 recursive calls is too slow for real-time geographic partitioning.
**Recommendation:** Use iterative or $O(n \log n)$ methods like k-d trees or geohash for practical performance.
3. **Problem c: Two outbreak-classification algorithms**
- Divide-and-Conquer: Recursively split region, classify halves, merge results.
- Fits large spatial data, $O(n \log n)$ complexity.
- Good for offline batch processing.
- Greedy/Heuristic: Scan reports in time order, cluster spatially close cases.
- Low latency, $O(n)$ complexity.
- Suitable for real-time streaming.
**Comparison:** Greedy better for real-time; Divide-and-Conquer better accuracy offline.
**Recommendation:** Hybrid approach combining both.
4. **Problem d: DP vs Branch-and-Bound for resource allocation**
- DP: Models knapsack problem, complexity $O(m \cdot s)$ where $m$ = kits, $s$ = stations.
- Pseudo-polynomial, feasible for small/moderate inputs.
- Branch-and-Bound: Exponential worst-case, uses pruning heuristics.
- Performance depends on data, not guaranteed real-time.
**Recommendation:** DP preferred for predictable, small-scale; B&B with heuristics for large dynamic cases.
5. **Problem e: Graph-based risk propagation**
- Graph nodes: counties/facilities; edges: travel routes.
- Algorithms:
1. BFS/DFS: $O(V+E)$, simple but naive.
2. SIR model: $O(V^2)$, realistic but heavy.
3. Dijkstra’s algorithm: $O(E + V \log V)$, computes fastest spread routes.
**Chosen:** Dijkstra variant with edge weights as transmission risk.
**Recommendation:** Parallel Dijkstra on distributed servers for MEDTRACE.
6. **Problem f: NP-hardness and approximation**
- Tasks like resource allocation, shortest-path covering, outbreak source detection are NP-hard.
- Polynomial-time approximation: Greedy set cover with $O(\log n)$ approximation ratio.
- Randomized algorithms: Randomized rounding for near-optimal solutions.
**Evaluation:** Approximation + randomization balance speed and accuracy.
7. **Problem g: String processing algorithm**
- Chosen: Rabin–Karp with parallel pre-processing.
- Handles pattern matching in heterogeneous streams efficiently.
- Complexity: Average $O(n + m)$, worst $O(nm)$ rare.
- Parallelization: Splits data, parallel hash computation, near-linear throughput scaling.
- Randomization reduces hash collisions, enables fuzzy matching.
**Final answers:**
- a) $O(n^2)$, unacceptable for $n=10^6$, recommend hashing + sorting.
- b) $T(n) = \Theta(n^2)$, iterative redesign recommended.
- c) Divide-and-Conquer $O(n \log n)$, Greedy $O(n)$, hybrid best.
- d) DP $O(m \cdot s)$ feasible for small, B&B exponential but heuristic.
- e) Dijkstra $O(E + V \log V)$ best for dynamic network.
- f) NP-hard tasks, use approximation and randomized algorithms.
- g) Rabin–Karp parallel, $O(n + m)$ average, parallelization improves throughput.