Read Section 6.12 of Mitchell.
We want to estimate the means of k equiprobable Gaussian distributions with identical variance given a set of m observations each of which was generated by first choosing one of the k Gaussians and then using the chosen Gaussian to generate the observation.
Note that we can consider this as a problem of determining the most likely state-symbol probabilities, B, in a HMM of k states where the distribution over transitions is uniform, i.e. and the parameters of interest, , are the state-symbol probabilities which are continuous and Gaussian.
An extension of the HMM model to handle the case for continuous bj is left for homework, but we will now proceed to develop this problem along, but not quite, the lines of Mitchell. The main differences will be for the sake of consistency with the notation we have already introduced.
Since the variances of the k Gaussians are known to be equal, the parameter of interest consists exactly of the k means. Let the observed variable and where denotes which of the k Gaussians was used to generate a given xi. Let where yi = (xi, zi).
From the EM theorem, we know
The quantity we must maximize with respect to
Since Y completely determines X and
P(Y|X) = 0 for all Y
inconsistent with X, the above is equivalent to
We want to maximize Q with respect to ,
so we differentiate
with respect to each of the
and set to zero.