What is MVN (Multivariate Normal Distribution)?
Home / Forums / AI & ML: Learn It Yourself / Linear Algebra / What is MVN (Multivariate Normal Distribution)?
Tagged: linear algebra
- This topic has 0 replies, 1 voice, and was last updated 3 months ago by
Wolf.
-
AuthorPosts
-
1. The Definition
A random vector $\mathbf{x} \in \mathbb{R}^k$ follows an MVN distribution, denoted as $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, if its joint probability density function (PDF) is:
$$f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^k |\boldsymbol{\Sigma}|}} \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x}-\boldsymbol{\mu})\right)$$
2. The Core Parameters
While a 1D Gaussian uses a scalar mean ($\mu$) and variance ($\sigma^2$), the MVN uses:
* Mean Vector ($\boldsymbol{\mu}$): A $k \times 1$ vector representing the expected value (the “peak”) in $k$-dimensional space.
* Covariance Matrix ($\boldsymbol{\Sigma}$): A $k \times k$ symmetric, positive semi-definite matrix that defines the “shape” of the data spread:
* Diagonal elements ($\Sigma_{ii}$): The variance of each individual feature.
* Off-diagonal elements ($\Sigma_{ij}$): The covariance (correlation) between features.
3. Geometric Interpretation: The Mahalanobis Distance
The term in the exponent, $D_M^2 = (\mathbf{x}-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x}-\boldsymbol{\mu})$, is the Squared Mahalanobis Distance.
* In standard Euclidean distance, we assume all directions are weighted equally.
* In MVN, the “isocontours” (surfaces of equal probability) are ellipsoids.
* The Mahalanobis distance measures how many “standard deviations” $\mathbf{x}$ is from $\boldsymbol{\mu}$, effectively “whitening” the data by accounting for the correlation and scale defined by $\boldsymbol{\Sigma}$.
4. Critical Properties for AI Research
In machine learning, we love the MVN because it is analytically tractable (you can solve the integrals with pen and paper):
Property Definition AI Application Linearity If $\mathbf{x} \sim \mathcal{N}$, then $\mathbf{Ax} + \mathbf{b} \sim \mathcal{N}$. Reparameterization Trick in VAEs. Marginalization If you “ignore” a subset of variables, the remainder is still MVN. Simplifying Bayesian Networks. Conditioning $P(\mathbf{x}_1 \mathbf{x}_2)$ is *always* another MVN. | The core of Gaussian Process Regression.
5. Why it Matters in Deep Learning
- The Reparameterization Trick: To backpropagate through a sampling layer, we sample $\epsilon \sim \mathcal{N}(0, \mathbf{I})$ and transform it: $\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \epsilon$.
- Maximum Entropy: For a known mean and covariance, the MVN is the distribution that assumes the least about the rest of the data, making it a safe “prior.”
- Latent Spaces: We often force neural networks to map complex data (images/text) into a simple MVN latent space to make generation easier.
-
AuthorPosts
- You must be logged in to reply to this topic.
