Consider our formulation of HMMs as before, i.e. is the alphabet, is the output sequence, st is the state at time t, A is the transition probability aij = P(st=j|st-1=i) and B is the state-symbol probability for observing a given output symbol bjk = P(Ot = vk|st=s).
Let Z = f(A,B) be a random variable such that xijk = f(A,B) denotes the joint event consisting of the transition from state i to j and the observation of the vk from state j. So the observed output is completely specified by a sequence . Let pijk = P(xijk).
Then the parameter of interest, , is exactly Z since it represents the totality of transition and state-symbol probabilities.
We wish to find a
To do this, we first find a stationary point of the LHS subject to the
Lagrange multipliers, the function to maximize with respect to the
variables pijk' is thus
Since X completely specifies O (recall that xijk also specifies
that symbol vk is observed in state j),
If cijk denotes the number of times xijk occurs in X, then
We can also satisfy ourselves that
Substituting in (17) and setting to zero,
If we now write
The RHS above is just the reestimated probability of xijk because prior to normalization, the sum yields the expected frequency of the xijk computed using the current values of the pijk.
We can now proceed to calculate each pijk' as above. The set of all such pijk's constitutes our which we can use in place of during the next iteration.
The EM theorem tells us that if we continue in this fashion, then after each iteration
This is an algorithm, because it is iterative. There is an expectation step (aka estimation step) and a maximization step (aka modification step) in each iteration.
Note that the algorithm is not guaranteed to converge at the global maximum. But it has been found to yield good results in practice.