Hmm Viterbi Tanh
1. **Problem 1: Compute the joint probability $p(x,z)$ for given sequences $x=[6,3,1,2,4]$ and $z=[L,F,F,L,L]$ using the HMM parameters.**
2. The joint probability factorizes as:
$$p(x,z) = P(z_1) P(x_1|z_1) \prod_{t=2}^5 P(z_t|z_{t-1}) P(x_t|z_t)$$
3. Step-by-step calculation:
- $t=1$: $P(z_1=L) = \frac{1}{2}$, $P(x_1=6|L) = \frac{1}{2}$, product so far: $\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}$
- $t=2$: $z_2=F$, multiply by $P(L\to F)=0.4=\frac{2}{5}$ and $P(3|F)=\frac{1}{6}$, product: $\frac{1}{4} \times \frac{2}{5} \times \frac{1}{6} = \frac{1}{60}$
- $t=3$: $z_3=F$, multiply by $P(F\to F)=0.6=\frac{3}{5}$ and $P(1|F)=\frac{1}{6}$, product: $\frac{1}{60} \times \frac{3}{5} \times \frac{1}{6} = \frac{1}{600}$
- $t=4$: $z_4=L$, multiply by $P(F\to L)=0.4=\frac{2}{5}$ and $P(2|L)=\frac{1}{10}$, product: $\frac{1}{600} \times \frac{2}{5} \times \frac{1}{10} = \frac{1}{15000}$
- $t=5$: $z_5=L$, multiply by $P(L\to L)=0.6=\frac{3}{5}$ and $P(4|L)=\frac{1}{10}$, product: $\frac{1}{15000} \times \frac{3}{5} \times \frac{1}{10} = \frac{3}{750000} = \frac{1}{250000}$
4. Final exact joint probability:
$$\boxed{p(x,z) = \frac{1}{250000}}$$
5. Decimal form:
$$\boxed{p(x,z) = 0.000004}$$
---
1. **Problem 2: Use the Viterbi algorithm to find the most probable hidden state sequence for observations $x=[4,6,3]$.**
2. Initialization ($t=1$):
- $V_1(F) = P(z_1=F) P(4|F) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} \approx 0.0833333$
- $V_1(L) = P(z_1=L) P(4|L) = \frac{1}{2} \times \frac{1}{10} = \frac{1}{20} = 0.05$
3. Recursion ($t=2$, $x_2=6$):
- For state $F$:
- from $F$: $V_1(F) \times a_{FF} \times b_F(6) = \frac{1}{12} \times 0.6 \times \frac{1}{6} = \frac{1}{120}$
- from $L$: $V_1(L) \times a_{LF} \times b_F(6) = \frac{1}{20} \times 0.4 \times \frac{1}{6} = \frac{1}{300}$
- max is $\frac{1}{120}$ from $F$, so $V_2(F) = \frac{1}{120}$, $\text{Ptr}(F,2) = F$
- For state $L$:
- from $F$: $\frac{1}{12} \times 0.4 \times \frac{1}{2} = \frac{1}{60} \approx 0.0166667$
- from $L$: $\frac{1}{20} \times 0.6 \times \frac{1}{2} = \frac{3}{200} = 0.015$
- max is $\frac{1}{60}$ from $F$, so $V_2(L) = \frac{1}{60}$, $\text{Ptr}(L,2) = F$
4. Recursion ($t=3$, $x_3=3$):
- For state $F$:
- from $F$: $\frac{1}{120} \times 0.6 \times \frac{1}{6} = \frac{1}{1200}$
- from $L$: $\frac{1}{60} \times 0.4 \times \frac{1}{6} = \frac{1}{900}$
- max is $\frac{1}{900}$ from $L$, so $V_3(F) = \frac{1}{900}$, $\text{Ptr}(F,3) = L$
- For state $L$:
- from $F$: $\frac{1}{120} \times 0.4 \times \frac{1}{10} = \frac{1}{3000}$
- from $L$: $\frac{1}{60} \times 0.6 \times \frac{1}{10} = \frac{1}{1000}$
- max is $\frac{1}{1000}$ from $L$, so $V_3(L) = \frac{1}{1000}$, $\text{Ptr}(L,3) = L$
5. Viterbi matrix summary:
- $V_1(F) = \frac{1}{12} \approx 0.0833333$, $V_1(L) = \frac{1}{20} = 0.05$
- $V_2(F) = \frac{1}{120} \approx 0.00833333$, $V_2(L) = \frac{1}{60} \approx 0.0166667$
- $V_3(F) = \frac{1}{900} \approx 0.00111111$, $V_3(L) = \frac{1}{1000} = 0.001$
6. Backpointers:
- $\text{Ptr}(F,2) = F$, $\text{Ptr}(L,2) = F$
- $\text{Ptr}(F,3) = L$, $\text{Ptr}(L,3) = L$
7. Final best probability:
$$\max(V_3(F), V_3(L)) = \max\left(\frac{1}{900}, \frac{1}{1000}\right) = \frac{1}{900}$$
8. Backtracking best path:
- At $t=3$: best state is $F$
- $\text{Ptr}(F,3) = L$ at $t=2$
- $\text{Ptr}(L,2) = F$ at $t=1$
9. Optimal hidden state sequence:
$$\boxed{z^* = [F, L, F]}$$
10. Probability of this path:
$$\boxed{p(x,z^*) = \frac{1}{900} \approx 0.00111111}$$
---
1. **Problem 3: Implement a feed-forward neural network using tanh activation and its gradient in MATLAB.**
2. The tanh activation function is:
$$y = \tanh(x)$$
3. Its gradient with respect to input $x$ is:
$$g = 1 - \tanh^2(x)$$
4. The feedforward network uses these functions replacing sigmoid and its gradient.
5. The network architecture includes:
- Input layer size $D$
- One hidden layer with 10 units
- Output layer with $C$ classes
6. Training uses mini-batch SGD with cross-entropy loss and softmax output.
7. The MATLAB code provided includes:
- `activation_tanh.m` for tanh activation
- `activation_tanh_gradient.m` for gradient
- `feedforward_network_tanh.m` for training and evaluation
8. The network achieves training accuracy approximately as printed.
---
**Summary:**
- Joint probability $p(x,z) = \frac{1}{250000} = 0.000004$
- Viterbi best path $z^* = [F, L, F]$ with probability $\frac{1}{900} \approx 0.00111111$
- Feedforward network uses tanh activation and gradient as specified in MATLAB code.