Medtrace Algorithms 8764F0

1. **Problem a: Time Complexity Analysis of CleanAndNormalize** The pseudo-code has nested loops: outer loop runs $n$ times, inner loop runs from $i$ to $n$, averaging about $\frac{n+1}{2}$ iterations. The total iterations for merging are approximately $$\sum_{i=1}^n (n - i + 1) = \frac{n(n+1)}{2} = O(n^2).$$ Sorting step uses efficient sort like merge sort with complexity $O(n \log n)$. Total complexity is $$O(n^2) + O(n \log n) = O(n^2).$$ For $n = 10^6$, $n^2 = 10^{12}$ operations, which is infeasible for real-time processing on limited hardware. **Recommendation:** Use hashing to merge in $O(n)$, then sort in $O(n \log n)$ for better scalability. 2. **Problem b: Solve recurrence $T(n) = 4T(n/2) + n \log n$** Using Master Theorem: - $a=4$, $b=2$, $f(n) = n \log n$ - Compare $n^{\log_b a} = n^{\log_2 4} = n^2$ with $f(n) = n \log n$ Since $f(n)$ is polynomially smaller than $n^2$, case 1 applies: $$T(n) = \Theta(n^2).$$ **Implication:** The divide-and-conquer approach with 4 recursive calls is too slow for real-time geographic partitioning. **Recommendation:** Use iterative or $O(n \log n)$ methods like k-d trees or geohash for practical performance. 3. **Problem c: Two outbreak-classification algorithms** - Divide-and-Conquer: Recursively split region, classify halves, merge results. - Fits large spatial data, $O(n \log n)$ complexity. - Good for offline batch processing. - Greedy/Heuristic: Scan reports in time order, cluster spatially close cases. - Low latency, $O(n)$ complexity. - Suitable for real-time streaming. **Comparison:** Greedy better for real-time; Divide-and-Conquer better accuracy offline. **Recommendation:** Hybrid approach combining both. 4. **Problem d: DP vs Branch-and-Bound for resource allocation** - DP: Models knapsack problem, complexity $O(m \cdot s)$ where $m$ = kits, $s$ = stations. - Pseudo-polynomial, feasible for small/moderate inputs. - Branch-and-Bound: Exponential worst-case, uses pruning heuristics. - Performance depends on data, not guaranteed real-time. **Recommendation:** DP preferred for predictable, small-scale; B&B with heuristics for large dynamic cases. 5. **Problem e: Graph-based risk propagation** - Graph nodes: counties/facilities; edges: travel routes. - Algorithms: 1. BFS/DFS: $O(V+E)$, simple but naive. 2. SIR model: $O(V^2)$, realistic but heavy. 3. Dijkstra’s algorithm: $O(E + V \log V)$, computes fastest spread routes. **Chosen:** Dijkstra variant with edge weights as transmission risk. **Recommendation:** Parallel Dijkstra on distributed servers for MEDTRACE. 6. **Problem f: NP-hardness and approximation** - Tasks like resource allocation, shortest-path covering, outbreak source detection are NP-hard. - Polynomial-time approximation: Greedy set cover with $O(\log n)$ approximation ratio. - Randomized algorithms: Randomized rounding for near-optimal solutions. **Evaluation:** Approximation + randomization balance speed and accuracy. 7. **Problem g: String processing algorithm** - Chosen: Rabin–Karp with parallel pre-processing. - Handles pattern matching in heterogeneous streams efficiently. - Complexity: Average $O(n + m)$, worst $O(nm)$ rare. - Parallelization: Splits data, parallel hash computation, near-linear throughput scaling. - Randomization reduces hash collisions, enables fuzzy matching. **Final answers:** - a) $O(n^2)$, unacceptable for $n=10^6$, recommend hashing + sorting. - b) $T(n) = \Theta(n^2)$, iterative redesign recommended. - c) Divide-and-Conquer $O(n \log n)$, Greedy $O(n)$, hybrid best. - d) DP $O(m \cdot s)$ feasible for small, B&B exponential but heuristic. - e) Dijkstra $O(E + V \log V)$ best for dynamic network. - f) NP-hard tasks, use approximation and randomized algorithms. - g) Rabin–Karp parallel, $O(n + m)$ average, parallelization improves throughput.