next up previous
Next: Decision Trees Up: Bayesian Learning 2: EM Previous: Derivation of the Baum-Welch

Estimating the means of k-Gaussians

Read Section 6.12 of Mitchell.

We want to estimate the means of k equiprobable Gaussian distributions with identical variance given a set of m observations each of which was generated by first choosing one of the k Gaussians and then using the chosen Gaussian to generate the observation.

Note that we can consider this as a problem of determining the most likely state-symbol probabilities, B, in a HMM of k states where the distribution over transitions is uniform, i.e. $\forall i,j:
a_{ij} = 1/k$ and the parameters of interest, $\theta$, are the state-symbol probabilities which are continuous and Gaussian.

An extension of the HMM model to handle the case for continuous bj is left for homework, but we will now proceed to develop this problem along, but not quite, the lines of Mitchell. The main differences will be for the sake of consistency with the notation we have already introduced.

Since the variances of the k Gaussians are known to be equal, the parameter of interest $\theta = \{\mu_1, \cdots, \mu_k \}$ consists exactly of the k means. Let the observed variable $X = \langle x_1,
\cdots, x_m\rangle$ and $Z = \langle z_1, \cdots, z_m \rangle$ where $z_i \in \{1, \cdots, k\}$ denotes which of the k Gaussians was used to generate a given xi. Let $Y = \langle y_1, \cdots, y_m \rangle$ where yi = (xi, zi).

From the EM theorem, we know

\begin{displaymath}\sum_Y P(Y\vert X) \log P'(Y,X) > \sum_Y P(Y\vert X) \log P(Y,X)
\Rightarrow
P'(X) > P(X)
\end{displaymath}

where, as usual, P and P' stand for $P_\theta$ and $P_{\theta'}$ respectively.

The quantity we must maximize with respect to $\theta'$ is

\begin{displaymath}Q = \sum_Y P(Y\vert X) \log P'(Y,X)
\end{displaymath}

Since Y completely determines X and P(Y|X) = 0 for all Y inconsistent with X, the above is equivalent to

\begin{eqnarray*}Q &=& \sum_Z P(Y\vert X) \log P'(Y)\\
&=& \frac{1}{P(X)} \sum...
...Y,X) \log P'(Y)\\
&=& \frac{1}{P(X)} \sum_Z P(Y) \log P'(Y)\\
\end{eqnarray*}


Now $P'(Y) = \prod_{i=1}^m P'(z_i,x_i)$. So

\begin{eqnarray*}Q &=& \frac{1}{P(X)} \sum_Z P(Y) \log \prod_{i=1}^m P'(x_i,z_i)...
... &=& \frac{1}{P(X)} \sum_Z P(Y) \sum_{i=1}^m \log P'(x_i,z_i)\\
\end{eqnarray*}


We can group the Y in the above sum by unique (xi,zi):

\begin{eqnarray*}Q &=& \frac{1}{P(X)} \sum_{j=1}^k \sum_{i=1}^m \log P'(x_i,z_i=...
...{P(X)} \sum_{j=1}^k \sum_{i=1}^m P(X,z_i=j)\log P'(x_i,z_i=j)\\
\end{eqnarray*}


We want to maximize Q with respect to $\theta'$, so we differentiate with respect to each of the $\mu_j'$ and set to zero.

\begin{eqnarray*}\frac{\partial Q}{\partial \mu_j'} &=& \frac{1}{P(X)} \frac{\pa...
..._i=j) \left[ C -\frac{(x_i - \mu_j')^2}
{2\sigma^2} \right] \\
\end{eqnarray*}


Where $C = -\log k\sqrt{2\pi\sigma^2}$. Dropping terms independent of $\mu_j'$,

\begin{eqnarray*}\frac{\partial Q}{\partial \mu_j'} &=& \frac{1}{P(X)} \frac{\pa...
...frac{1}{\sigma^2P(X)} \sum_{i=1}^m P(X,z_i=j) (x_i - \mu_j') \\
\end{eqnarray*}


Setting $\partial Q/\partial \mu_j' = 0$ and solving for $\mu_j'$,

\begin{eqnarray*}\frac{1}{\sigma^2P(X)} \sum_{i=1}^m P(X,z_i=j) (x_i - \mu_j')= ...
...um_{i=1}^m x_i P(z_i=j\vert X)}{\sum_{i=1}^m P(z_i=j\vert X)}\\
\end{eqnarray*}


Thus the reestimated $\mu_j'$ which will be the $\mu_j$ for the next iteration is just the weighted sample mean using the current value of $\mu_j$.


next up previous
Next: Decision Trees Up: Bayesian Learning 2: EM Previous: Derivation of the Baum-Welch
Anand Venkataraman
1999-09-16