| |||||||||
In probability theory and statistics, a random vector X = (X1, ..., Xn) follows a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution), if it satisfies the following equivalent conditions:
The following is not quite equivalent to the conditions above, since it fails to allow for a singular matrix as the variance:
f_X(x_1,\ldots,x_n)\, dx_1\ldots dx_n= \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2}({\mathbf x}-{\mathbf\mu})^T{\mathbf\Sigma}^{-1}({\mathbf x}-{\mathbf\mu}) \right)dx_1\ldots dx_n <math>
where <math>\left|A\right|<math> is the determinant of <math>A<math>. Note how the equation above reduces to that of the univariate normal distribution if <math>\Sigma<math> is a <math>1\times 1<math> matrix (ie a real number).
The vector μ in these conditions is the expected value of X and the matrix <math>{\mathbf\Sigma}={\mathbf A}{\mathbf A}^T<math> is the covariance matrix of the components Xi. It is important to realize that the covariance matrix must be allowed to be singular. That case arises frequently in statistics; for example, in the distribution of the vector of residuals in ordinary linear regression problems. Note also that the Xi are in general not independent; they can be seen as the result of applying the linear transformation A to a collection of independent Gaussian variables Z.
If <math>{\mathbf y}={\mathbf B}{\mathbf x}<math> is a linear transformation of <math>{\mathbf x}<math> where <math>{\mathbf B}<math> is a rank <math>m<math> <math>m\times p<math> matrix with <math>m\leq p<math> then <math>{\mathbf y}<math> has a multivariate normal distribution with a mean of <math>{\mathbf B}{\mathbf\mu}<math> and a covariance matrix <math>{\mathbf B}{\mathbf\Sigma}{\mathbf B}^T<math>.
Corollary: any subset of the <math>x_i<math> has a marginal distribution that is also multivariate normal. To see this consider the following example: to extract the subset <math>(x_1,x2,x_4)^T<math>, use
{\mathbf B}= \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & \ldots & 0\\ 0 & 1 & 0 & 0 & 0 & \ldots & 0\\ 0 & 0 & 0 & 1 & 0 & \ldots & 0 \end{bmatrix} <math> which extracts the desired elements directly.
If <math>{\mathbf x}<math> is partitioned into <math>{\mathbf x}_1<math> and <math>{\mathbf x}_2<math> (so <math>{\mathbf x}=({\mathbf x}_1,{\mathbf x}_2)^T<math> (note that vectors are column vectors by default). Say <math>{\mathbf x}_1<math> has <math>q<math> elements, so <math>{\mathbf x}_2<math> has <math>p-q<math> elements.
Then if <math>{\mathbf\mu}<math> and <math>{\mathbf\Sigma}<math> are partitioned as follows
{\mathbf\mu}=\left(\begin{matrix} {\mathbf\mu}_1\\ {\mathbf\mu}_2 \end{matrix} \right) \qquad {\mathbf\Sigma}= \begin{bmatrix} {\mathbf\Sigma}_{11} & {\mathbf\Sigma}_{12} \\ {\mathbf\Sigma}_{21} & {\mathbf\Sigma}_{22} \end{bmatrix} <math>
then the distribution of <math>{\mathbf x}_1<math> conditional on <math>{\mathbf x}_2={\mathbf a}<math> is multivariate normal with mean
{\mathbf\mu}_1+{\mathbf\Sigma}_{12}{\mathbf\Sigma}_{22}^{-1}\left({\mathbf a}-{\mathbf\mu}_2\right)<math>
and covariance matrix
{\mathbf\Sigma}_{11}- {\mathbf\Sigma}_{12} {\mathbf\Sigma}_{22}^{-1} {\mathbf\Sigma}_{21}. <math>
This matrix is the Schur complement of <math>{\mathbf\Sigma_{22}}<math> in <math>{\mathbf\Sigma}<math>.
Note that knowing the value of <math>{\mathbf x}_2<math> to be <math>{\mathbf a}<math> alters the variance; perhaps more suprisingly, the mean is shifted by <math>{\mathbf\Sigma}_{12}{\mathbf\Sigma}_{22}^{-1}\left({\mathbf a}-{\mathbf\mu}_2\right)<math>; compare this with the situation of not knowing the value of <math>{\mathbf a}<math>, in which case <math>{\mathbf x}_1<math> would have distribution <math>N_q\left({\mathbf\mu}_1,{\mathbf\Sigma}_{11}\right)<math>.
The matrix <math>{\mathbf\Sigma}_{12}{\mathbf\Sigma}_{22}^{-1}<math> is known as the matrix of regression coefficients.
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. See estimation of covariance matrices.