Read Section 6.12 of Mitchell.
We want to estimate the means of k equiprobable Gaussian distributions with identical variance given a set of m observations each of which was generated by first choosing one of the k Gaussians and then using the chosen Gaussian to generate the observation.
Note that we can consider this as a problem of determining the most
likely state-symbol probabilities, B, in a HMM of k states where
the distribution over transitions is uniform, i.e.
and the parameters of interest,
,
are the
state-symbol probabilities which are continuous and Gaussian.
An extension of the HMM model to handle the case for continuous bj is left for homework, but we will now proceed to develop this problem along, but not quite, the lines of Mitchell. The main differences will be for the sake of consistency with the notation we have already introduced.
Since the variances of the k Gaussians are known to be equal, the
parameter of interest
consists
exactly of the k means. Let the observed variable
and
where
denotes which of the k Gaussians was used
to generate a given xi. Let
where
yi = (xi, zi).
From the EM theorem, we know
The quantity we must maximize with respect to
is
Since Y completely determines X and
P(Y|X) = 0 for all Y
inconsistent with X, the above is equivalent to
Now
.
So
We want to maximize Q with respect to
,
so we differentiate
with respect to each of the
and set to zero.