What is MVN (Multivariate Normal Distribution)?

Home / Forums / AI & ML: Learn It Yourself / Linear Algebra / What is MVN (Multivariate Normal Distribution)?

  • Author
    Posts
    • February 15, 2026 at 11:15 pm #5400

      1. The Definition

      A random vector $\mathbf{x} \in \mathbb{R}^k$ follows an MVN distribution, denoted as $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, if its joint probability density function (PDF) is:

      $$f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^k |\boldsymbol{\Sigma}|}} \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x}-\boldsymbol{\mu})\right)$$

      2. The Core Parameters

      While a 1D Gaussian uses a scalar mean ($\mu$) and variance ($\sigma^2$), the MVN uses:
      * Mean Vector ($\boldsymbol{\mu}$): A $k \times 1$ vector representing the expected value (the “peak”) in $k$-dimensional space.
      * Covariance Matrix ($\boldsymbol{\Sigma}$): A $k \times k$ symmetric, positive semi-definite matrix that defines the “shape” of the data spread:
      * Diagonal elements ($\Sigma_{ii}$): The variance of each individual feature.
      * Off-diagonal elements ($\Sigma_{ij}$): The covariance (correlation) between features.


      3. Geometric Interpretation: The Mahalanobis Distance

      The term in the exponent, $D_M^2 = (\mathbf{x}-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x}-\boldsymbol{\mu})$, is the Squared Mahalanobis Distance.
      * In standard Euclidean distance, we assume all directions are weighted equally.
      * In MVN, the “isocontours” (surfaces of equal probability) are ellipsoids.
      * The Mahalanobis distance measures how many “standard deviations” $\mathbf{x}$ is from $\boldsymbol{\mu}$, effectively “whitening” the data by accounting for the correlation and scale defined by $\boldsymbol{\Sigma}$.


      4. Critical Properties for AI Research

      In machine learning, we love the MVN because it is analytically tractable (you can solve the integrals with pen and paper):

      Property Definition AI Application
      Linearity If $\mathbf{x} \sim \mathcal{N}$, then $\mathbf{Ax} + \mathbf{b} \sim \mathcal{N}$. Reparameterization Trick in VAEs.
      Marginalization If you “ignore” a subset of variables, the remainder is still MVN. Simplifying Bayesian Networks.
      Conditioning $P(\mathbf{x}_1 \mathbf{x}_2)$ is *always* another MVN. | The core of Gaussian Process Regression.

      5. Why it Matters in Deep Learning

      1. The Reparameterization Trick: To backpropagate through a sampling layer, we sample $\epsilon \sim \mathcal{N}(0, \mathbf{I})$ and transform it: $\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \epsilon$.
      2. Maximum Entropy: For a known mean and covariance, the MVN is the distribution that assumes the least about the rest of the data, making it a safe “prior.”
      3. Latent Spaces: We often force neural networks to map complex data (images/text) into a simple MVN latent space to make generation easier.
      • This topic was modified 3 months ago by Wolf.
      • This topic was modified 3 months ago by yRocket.
    • You must be logged in to reply to this topic.