对于随机变量 XX 的 N 个采样值 X1..XNX_1..X_N,可得其均值与方差:

Xˉ=1Ni=1NXiVar(X)=1Ni=1N(Xiμ)2 \begin{aligned} \bar X & = \frac{1}{N} \sum_{i=1}^{N} X_i \\ \operatorname{Var}(X) & = \frac{1}{N} \sum_{i=1}^{N} (X_i - \mu)^2 \end{aligned}

若随机变量符合正态分布 N(μ,σ2)\mathcal{N}(\mu, \sigma^2),则可得:

E[X]=μE[Var(X)]=σ2 \begin{aligned} \mathbb{E}[X] & = \mu \\ \mathbb{E}[\operatorname{Var}(X)] & = \sigma^2 \\ \end{aligned}

可得均值的方差的期望为:

E[Var(Xˉ)]=E[Xˉ2]E[Xˉ]2=1N2E[(i=1NXi)2]μ2=NE[Xi2]+i=1,j=1,jiNE(XiXj)N2μ2=N(σ2+μ2)+N(N1)μ2N2μ2=σ2N \begin{aligned} \mathbb{E}[\operatorname{Var}(\bar X)] & = \mathbb{E}[\bar X^2] - \mathbb{E}[\bar X]^2 \\ & = \frac{1}{N^2} \mathbb{E}[(\sum_{i=1}^{N} X_i)^2] - \mu^2 \\ & = \frac{N \mathbb{E}[X_i^2] + \sum_{i=1,j=1,j \neq i}^{N}\mathbb{E}(X_iX_j)}{N^2} - \mu^2\\ & = \frac{N (\sigma^2 + \mu^2) + N(N-1) \mu^2}{N^2} - \mu^2 \\ & = \frac{\sigma^2}{N} \\ \end{aligned}

若使用 Xˉ\bar X 作为 μ \mu 的近似计算方差,则:

E[1Ni=1N(XiXˉ)2]=E[1Ni=1N(Xiμ+μXˉ)2]=E[1Ni=1N(Xiμ)2]+E[2Ni=1N(Xiμ)(μXˉ)]+E[1Ni=1N(μXˉ)2]=E[Var(X)]+E[2(Xˉμ)(μXˉ)]+E[(μXˉ)2]=E[Var(X)]E[Var(Xˉ)]=N1Nσ2 \begin{aligned} \mathbb{E}\left[\frac{1}{N} \sum_{i=1}^{N} (X_i - \bar X)^2\right] & = \mathbb{E}\left[\frac{1}{N} \sum_{i=1}^{N} (X_i - \mu + \mu - \bar X)^2\right] \\ & = \mathbb{E}\left[\frac{1}{N} \sum_{i=1}^{N} (X_i - \mu)^2\right] + \mathbb{E}\left[\frac{2}{N} \sum_{i=1}^{N} (X_i - \mu)(\mu - \bar X)\right] + \mathbb{E}\left[\frac{1}{N} \sum_{i=1}^{N} (\mu - \bar X)^2\right] \\ & = \mathbb{E}[\operatorname{Var}(X)] + \mathbb{E}[2(\bar X - \mu)(\mu - \bar X)] + \mathbb{E}[(\mu - \bar X)^2] \\ & = \mathbb{E}[\operatorname{Var}(X)] - \mathbb{E}[\operatorname{Var}(\bar X)] \\ & = \frac{N-1}{N} \sigma^2 \end{aligned}

所以:

σ2=Var(X)=E[1N1i=1N(XiXˉ)2] \sigma^2 = \operatorname{Var}(X) = \mathbb{E}\left[\frac{1}{N-1} \sum_{i=1}^{N} (X_i - \bar X)^2\right]

换而言之,若希望从观测数据集 DD 中求出最大似然的 μ\muσ2\sigma^2,即最大化 p(Dμ,σ2)p(D|\mu, \sigma^2),则:

p(Dμ,σ2)=i=1NN(Xiμ,σ2) p(D|\mu, \sigma^2) = \prod_{i=1}^{N}\mathcal{N}(X_i | \mu, \sigma^2)

取对数,则:

lnp(Dμ,σ2)=i=1NlnN(Xiμ,σ2)=i=1Nln12πσ2exp{12σ2(xμ)2}=i=1Nln12π+i=1Nln1σ2i=1N12σ2(xμ)2=N2ln(2π)N2lnσ212σ2i=1N(xμ)2 \begin{aligned} \operatorname{ln}p(D|\mu, \sigma^2) & = \sum_{i=1}^{N}\operatorname{ln}\mathcal{N}(X_i | \mu, \sigma^2) \\ & = \sum_{i=1}^{N}\operatorname{ln}\frac{1}{\sqrt{2\pi\sigma^2}}\operatorname{exp}\left\{-\frac{1}{2\sigma^2}(x-\mu)^2\right\} \\ & = \sum_{i=1}^{N}\operatorname{ln}\frac{1}{\sqrt{2\pi}} + \sum_{i=1}^{N}\operatorname{ln}\frac{1}{\sqrt{\sigma^2}} - \sum_{i=1}^{N}\frac{1}{2\sigma^2}(x-\mu)^2 \\ & = - \frac{N}{2}\operatorname{ln}(2\pi) - \frac{N}{2}\operatorname{ln}\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^{N}(x-\mu)^2 \end{aligned}

若要最大化,则对 μ\mu 求导可知 μML=Dˉ\mu_{ML} = \bar D,同样对 σ\sigma 求导可得 σML2=Var(D)\sigma_{ML}^2 = \operatorname{Var}(D)。由此可知最大似然为 p(DXˉ,Var(D))p(D|\bar X, \operatorname{Var}(D))。但是这种情况下存在对数据 DD 的过拟合,或者称为有偏估计(biased estimator),既 σML2=N1Nσ2\sigma_{ML}^2 = \frac{N-1}{N}\sigma^2(参见使用 Xˉ\bar X 作为 μ \mu 的近似计算方差的情况),所以可得:

N(XiDˉ,NN1Var(D)) \mathcal{N}(X_i | \bar D, \frac{N}{N-1}\operatorname{Var}(D))

为数据 D 的分布的最大似然估计