Hmm Viterbi Tanh

1. **Problem 1: Compute the joint probability $p(x,z)$ for given sequences $x=[6,3,1,2,4]$ and $z=[L,F,F,L,L]$ using the HMM parameters.** 2. The joint probability factorizes as: $$p(x,z) = P(z_1) P(x_1|z_1) \prod_{t=2}^5 P(z_t|z_{t-1}) P(x_t|z_t)$$ 3. Step-by-step calculation: - $t=1$: $P(z_1=L) = \frac{1}{2}$, $P(x_1=6|L) = \frac{1}{2}$, product so far: $\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}$ - $t=2$: $z_2=F$, multiply by $P(L\to F)=0.4=\frac{2}{5}$ and $P(3|F)=\frac{1}{6}$, product: $\frac{1}{4} \times \frac{2}{5} \times \frac{1}{6} = \frac{1}{60}$ - $t=3$: $z_3=F$, multiply by $P(F\to F)=0.6=\frac{3}{5}$ and $P(1|F)=\frac{1}{6}$, product: $\frac{1}{60} \times \frac{3}{5} \times \frac{1}{6} = \frac{1}{600}$ - $t=4$: $z_4=L$, multiply by $P(F\to L)=0.4=\frac{2}{5}$ and $P(2|L)=\frac{1}{10}$, product: $\frac{1}{600} \times \frac{2}{5} \times \frac{1}{10} = \frac{1}{15000}$ - $t=5$: $z_5=L$, multiply by $P(L\to L)=0.6=\frac{3}{5}$ and $P(4|L)=\frac{1}{10}$, product: $\frac{1}{15000} \times \frac{3}{5} \times \frac{1}{10} = \frac{3}{750000} = \frac{1}{250000}$ 4. Final exact joint probability: $$\boxed{p(x,z) = \frac{1}{250000}}$$ 5. Decimal form: $$\boxed{p(x,z) = 0.000004}$$ --- 1. **Problem 2: Use the Viterbi algorithm to find the most probable hidden state sequence for observations $x=[4,6,3]$.** 2. Initialization ($t=1$): - $V_1(F) = P(z_1=F) P(4|F) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} \approx 0.0833333$ - $V_1(L) = P(z_1=L) P(4|L) = \frac{1}{2} \times \frac{1}{10} = \frac{1}{20} = 0.05$ 3. Recursion ($t=2$, $x_2=6$): - For state $F$: - from $F$: $V_1(F) \times a_{FF} \times b_F(6) = \frac{1}{12} \times 0.6 \times \frac{1}{6} = \frac{1}{120}$ - from $L$: $V_1(L) \times a_{LF} \times b_F(6) = \frac{1}{20} \times 0.4 \times \frac{1}{6} = \frac{1}{300}$ - max is $\frac{1}{120}$ from $F$, so $V_2(F) = \frac{1}{120}$, $\text{Ptr}(F,2) = F$ - For state $L$: - from $F$: $\frac{1}{12} \times 0.4 \times \frac{1}{2} = \frac{1}{60} \approx 0.0166667$ - from $L$: $\frac{1}{20} \times 0.6 \times \frac{1}{2} = \frac{3}{200} = 0.015$ - max is $\frac{1}{60}$ from $F$, so $V_2(L) = \frac{1}{60}$, $\text{Ptr}(L,2) = F$ 4. Recursion ($t=3$, $x_3=3$): - For state $F$: - from $F$: $\frac{1}{120} \times 0.6 \times \frac{1}{6} = \frac{1}{1200}$ - from $L$: $\frac{1}{60} \times 0.4 \times \frac{1}{6} = \frac{1}{900}$ - max is $\frac{1}{900}$ from $L$, so $V_3(F) = \frac{1}{900}$, $\text{Ptr}(F,3) = L$ - For state $L$: - from $F$: $\frac{1}{120} \times 0.4 \times \frac{1}{10} = \frac{1}{3000}$ - from $L$: $\frac{1}{60} \times 0.6 \times \frac{1}{10} = \frac{1}{1000}$ - max is $\frac{1}{1000}$ from $L$, so $V_3(L) = \frac{1}{1000}$, $\text{Ptr}(L,3) = L$ 5. Viterbi matrix summary: - $V_1(F) = \frac{1}{12} \approx 0.0833333$, $V_1(L) = \frac{1}{20} = 0.05$ - $V_2(F) = \frac{1}{120} \approx 0.00833333$, $V_2(L) = \frac{1}{60} \approx 0.0166667$ - $V_3(F) = \frac{1}{900} \approx 0.00111111$, $V_3(L) = \frac{1}{1000} = 0.001$ 6. Backpointers: - $\text{Ptr}(F,2) = F$, $\text{Ptr}(L,2) = F$ - $\text{Ptr}(F,3) = L$, $\text{Ptr}(L,3) = L$ 7. Final best probability: $$\max(V_3(F), V_3(L)) = \max\left(\frac{1}{900}, \frac{1}{1000}\right) = \frac{1}{900}$$ 8. Backtracking best path: - At $t=3$: best state is $F$ - $\text{Ptr}(F,3) = L$ at $t=2$ - $\text{Ptr}(L,2) = F$ at $t=1$ 9. Optimal hidden state sequence: $$\boxed{z^* = [F, L, F]}$$ 10. Probability of this path: $$\boxed{p(x,z^*) = \frac{1}{900} \approx 0.00111111}$$ --- 1. **Problem 3: Implement a feed-forward neural network using tanh activation and its gradient in MATLAB.** 2. The tanh activation function is: $$y = \tanh(x)$$ 3. Its gradient with respect to input $x$ is: $$g = 1 - \tanh^2(x)$$ 4. The feedforward network uses these functions replacing sigmoid and its gradient. 5. The network architecture includes: - Input layer size $D$ - One hidden layer with 10 units - Output layer with $C$ classes 6. Training uses mini-batch SGD with cross-entropy loss and softmax output. 7. The MATLAB code provided includes: - `activation_tanh.m` for tanh activation - `activation_tanh_gradient.m` for gradient - `feedforward_network_tanh.m` for training and evaluation 8. The network achieves training accuracy approximately as printed. --- **Summary:** - Joint probability $p(x,z) = \frac{1}{250000} = 0.000004$ - Viterbi best path $z^* = [F, L, F]$ with probability $\frac{1}{900} \approx 0.00111111$ - Feedforward network uses tanh activation and gradient as specified in MATLAB code.