What is Cholesky Decomposition?

Home / Forums / AI & ML: Learn It Yourself / Linear Algebra / What is Cholesky Decomposition?

  • Author
    Posts
    • February 16, 2026 at 10:10 am #5422

      Understanding Cholesky Decomposition

      Cholesky Decomposition is a way of “factoring” a matrix, similar to how you might factor the number 25 into $5 \times 5$. In linear algebra, it breaks down a specific type of square matrix into the product of a lower triangular matrix and its conjugate transpose.

      It is significantly more efficient than other methods (like LU decomposition) for solving linear equations, provided the matrix meets certain criteria.

      To an AI learner, Cholesky Decomposition is essentially the “square root” of a matrix. Just as $\sqrt{25}=5$, Cholesky breaks down a specific type of matrix into a simpler, triangular matrix that, when multiplied by its own transpose, gives you the original matrix back.
      In Machine Learning, this is the secret sauce behind Gaussian Processes, Kalman Filters, and generating correlated random noise for simulations.


      1. The Mathematical Definition

      If we have a Hermitian, positive-definite matrix $A$, the Cholesky decomposition is defined as:

      $$A = LL^*$$

      Where:
      * $L$: A lower triangular matrix with real, positive diagonal entries.
      * $L^*$: The conjugate transpose of $L$. (For real-valued matrices, this is simply the transpose $L^T$).


      2. Prerequisites (The “Rules”)

      Not every matrix can be decomposed this way. To use Cholesky, the matrix $A$ must be:
      1. Symmetric (or Hermitian): The matrix must be equal to its own transpose ($A = A^T$).
      2. Positive-Definite: For any non-zero vector $x$, the scalar $x^T Ax$ must be greater than zero. Visually, this means the matrix behaves like a “positive number” in higher dimensions.


      3. Why Use It? (The AI Perspective)

      In AI and Machine Learning, we rarely do math just for fun—we do it for speed and stability. Cholesky is popular because:

      • Efficiency: It is roughly twice as fast as LU decomposition for solving $Ax = b$.
      • Numerical Stability: It is less prone to rounding errors, which is crucial when training deep neural networks or processing large datasets.
      • Applications:
        • Monte Carlo Simulations: Generating correlated random variables.
        • Gaussian Processes: A core technique in Bayesian inference and optimization.
        • Linear Least Squares: Used in various regression models.

      4. Simple Example (Real 2×2 Matrix)

      Given a symmetric, positive-definite matrix $A$:
      $$A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$$

      We want to find $L = \begin{pmatrix} l_{11} & 0 \cr l_{21} & l_{22} \end{pmatrix}$ such that $LL^T = A$:

      1. Calculate $l_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$.
      2. Calculate $l_{21} = \frac{a_{21}}{l_{11}} = \frac{12}{2} = 6$.
      3. Calculate $l_{22} = \sqrt{a_{22} – l_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = 3$.

      Resulting in:
      $$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$

      • This topic was modified 3 months ago by Wolf.
      • This topic was modified 3 months ago by yRocket.
      • This topic was modified 3 months ago by yRocket.
      • This topic was modified 3 months ago by yRocket.
      • This topic was modified 3 months ago by yRocket.
        February 16, 2026 at 10:26 am #5428

        Calculating the Lower Triangular Matrix (L)

        To find the Lower Triangular Matrix $L$ such that $A = LL^T$, we use the Cholesky-Banachiewicz algorithm. This method calculates entries of $L$ row by row, starting from the top-left.


        1. The Mathematical Formulas

        For a matrix $A$ of size $n \times n$:

        • For Diagonal Elements ($i = j$):
          $$L_{ii} = \sqrt{a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2}$$
        • For Off-Diagonal Elements ($i > j$):
          $$L_{ij} = \frac{1}{L_{jj}} \left( a_{ij} – \sum_{k=1}^{j-1} L_{ik} L_{jk} \right)$$


        2. Step-by-Step Example

        Let’s solve for $A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$. We need to find $L = \begin{pmatrix} L_{11} & 0 \cr L_{21} & L_{22} \end{pmatrix}$.

        Step 1: Calculate $L_{11}$ (The first diagonal)
        $$L_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$$

        Step 2: Calculate $L_{21}$ (The first column, second row)
        $$L_{21} = \frac{a_{21}}{L_{11}} = \frac{12}{2} = 6$$

        Step 3: Calculate $L_{22}$ (The second diagonal)
        $$L_{22} = \sqrt{a_{22} – L_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = \sqrt{9} = 3$$


        3. Final Result

        The resulting Lower Triangular Matrix $L$ is:

        $$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$


        4. AI Learner’s Summary Table

        Step Type Goal Intuition for AI
        Diagonal $L_{ii}$ Accounting for the “variance” left in the row.
        Off-Diagonal $L_{ij}$ Determining the “correlation” or weight between dimensions.
        Constraint $a_{ii} > \sum L_{ik}^2$ Ensuring the matrix is Positive-Definite (no imaginary numbers).

        Solving $Ax = b$ using this $L$ is much more efficient in neural network optimizations because you can use simple “Forward Substitution” to find your weights.

          February 16, 2026 at 11:04 am #5431

          How to calculate Lower Triangular Matrix in Cholesky Decomposition?

          In AI fields like Gaussian Processes or Kalman filtering, the Cholesky Decomposition is a cornerstone for handling covariance matrices. It decomposes a symmetric, positive-definite matrix $A$ into $A = LL^T$, where $L$ is a lower triangular matrix.

          1. The Derivation of Cholesky Formulas

          To derive the formulas, we start by setting up the matrix multiplication for $A = LL^T$. Let’s look at the elements of $A$ resulting from the product of $L$ and its transpose $L^T$:

          $$A = \begin{pmatrix} L_{11} & 0 & 0 \cr L_{21} & L_{22} & 0 \cr L_{31} & L_{32} & L_{33} \end{pmatrix} \begin{pmatrix} L_{11} & L_{21} & L_{31} \cr 0 & L_{22} & L_{32} \cr 0 & 0 & L_{33} \end{pmatrix}$$

          By the definition of matrix multiplication, the element $a_{ij}$ is the dot product of the $i$-th row of $L$ and the $j$-th column of $L^T$ (which is the $j$-th row of $L$):
          $$a_{ij} = \sum_{k=1}^{n} L_{ik} (L^T){kj} = \sum_{k=1}^{n} L_{ik} L_{jk}$$

          Since $L$ is lower triangular ($L_{ik} = 0$ for $k > i$), the summation only goes up to the minimum of $i$ and $j$. For $i \ge j$:
          $$a_{ij} = \sum_{k=1}^{j} L_{ik} L_{jk}$$

          A. Deriving Diagonal Elements ($i = j$)

          When $i = j$, the equation becomes:
          $$a_{ii} = L_{i1}^2 + L_{i2}^2 + \dots + L_{ii}^2$$
          Isolating $L_{ii}$, we get:
          $$L_{ii}^2 = a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2 \implies L_{ii} = \sqrt{a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2}$$

          B. Deriving Off-Diagonal Elements ($i > j$)

          When $i > j$, we expand the sum:
          $$a_{ij} = (L_{i1}L_{j1} + L_{i2}L_{j2} + \dots + L_{i,j-1}L_{j,j-1}) + L_{ij}L_{jj}$$
          Isolating $L_{ij}$, we get:
          $$L_{ij} = \frac{1}{L_{jj}} \left( a_{ij} – \sum_{k=1}^{j-1} L_{ik} L_{jk} \right)$$


          2. Practical Calculation Example

          Consider $A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$. We want to find $L = \begin{pmatrix} L_{11} & 0 \cr L_{21} & L_{22} \end{pmatrix}$.

          Step 1: Calculate $L_{11}$
          Using the diagonal formula:
          $$L_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$$

          Step 2: Calculate $L_{21}$
          Using the off-diagonal formula:
          $$L_{21} = \frac{a_{21}}{L_{11}} = \frac{12}{2} = 6$$

          Step 3: Calculate $L_{22}$
          Using the diagonal formula again:
          $$L_{22} = \sqrt{a_{22} – L_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = \sqrt{9} = 3$$

          Result:
          $$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$


          3. Summary for AI Learners

          Concept Importance in AI
          Numerical Stability Avoids the errors often found in standard matrix inversion.
          Computational Speed Roughly $O(\frac{1}{3}n^3)$, twice as fast as LU decomposition.
          Sampling To sample from $N(\mu, \Sigma)$, we use $x = \mu + Lz$ where $z \sim N(0, I)$.

          By understanding this derivation, you can see why the matrix $A$ must be Positive-Definite: the value inside the square root ($a_{ii} – \sum L_{ik}^2$) must be positive for $L$ to remain in the real number domain.

          • This reply was modified 3 months ago by Wolf.
          • This reply was modified 3 months ago by Wolf.
          • This reply was modified 3 months ago by Wolf.
          • This reply was modified 3 months ago by Wolf.
          • This reply was modified 3 months ago by Wolf.
        • You must be logged in to reply to this topic.