What is Cholesky Decomposition?
Home / Forums / AI & ML: Learn It Yourself / Linear Algebra / What is Cholesky Decomposition?
- This topic has 2 replies, 1 voice, and was last updated 3 months ago by
Wolf.
-
AuthorPosts
-
Understanding Cholesky Decomposition
Cholesky Decomposition is a way of “factoring” a matrix, similar to how you might factor the number 25 into $5 \times 5$. In linear algebra, it breaks down a specific type of square matrix into the product of a lower triangular matrix and its conjugate transpose.
It is significantly more efficient than other methods (like LU decomposition) for solving linear equations, provided the matrix meets certain criteria.
To an AI learner, Cholesky Decomposition is essentially the “square root” of a matrix. Just as $\sqrt{25}=5$, Cholesky breaks down a specific type of matrix into a simpler, triangular matrix that, when multiplied by its own transpose, gives you the original matrix back.
In Machine Learning, this is the secret sauce behind Gaussian Processes, Kalman Filters, and generating correlated random noise for simulations.
1. The Mathematical Definition
If we have a Hermitian, positive-definite matrix $A$, the Cholesky decomposition is defined as:
$$A = LL^*$$
Where:
* $L$: A lower triangular matrix with real, positive diagonal entries.
* $L^*$: The conjugate transpose of $L$. (For real-valued matrices, this is simply the transpose $L^T$).
2. Prerequisites (The “Rules”)
Not every matrix can be decomposed this way. To use Cholesky, the matrix $A$ must be:
1. Symmetric (or Hermitian): The matrix must be equal to its own transpose ($A = A^T$).
2. Positive-Definite: For any non-zero vector $x$, the scalar $x^T Ax$ must be greater than zero. Visually, this means the matrix behaves like a “positive number” in higher dimensions.
3. Why Use It? (The AI Perspective)
In AI and Machine Learning, we rarely do math just for fun—we do it for speed and stability. Cholesky is popular because:
- Efficiency: It is roughly twice as fast as LU decomposition for solving $Ax = b$.
- Numerical Stability: It is less prone to rounding errors, which is crucial when training deep neural networks or processing large datasets.
- Applications:
- Monte Carlo Simulations: Generating correlated random variables.
- Gaussian Processes: A core technique in Bayesian inference and optimization.
- Linear Least Squares: Used in various regression models.
4. Simple Example (Real 2×2 Matrix)
Given a symmetric, positive-definite matrix $A$:
$$A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$$We want to find $L = \begin{pmatrix} l_{11} & 0 \cr l_{21} & l_{22} \end{pmatrix}$ such that $LL^T = A$:
- Calculate $l_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$.
- Calculate $l_{21} = \frac{a_{21}}{l_{11}} = \frac{12}{2} = 6$.
- Calculate $l_{22} = \sqrt{a_{22} – l_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = 3$.
Resulting in:
$$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$-
This topic was modified 3 months ago by
Wolf.
-
This topic was modified 3 months ago by
yRocket.
-
This topic was modified 3 months ago by
yRocket.
-
This topic was modified 3 months ago by
yRocket.
-
This topic was modified 3 months ago by
yRocket.
Calculating the Lower Triangular Matrix (L)
To find the Lower Triangular Matrix $L$ such that $A = LL^T$, we use the Cholesky-Banachiewicz algorithm. This method calculates entries of $L$ row by row, starting from the top-left.
1. The Mathematical Formulas
For a matrix $A$ of size $n \times n$:
- For Diagonal Elements ($i = j$):
$$L_{ii} = \sqrt{a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2}$$ -
For Off-Diagonal Elements ($i > j$):
$$L_{ij} = \frac{1}{L_{jj}} \left( a_{ij} – \sum_{k=1}^{j-1} L_{ik} L_{jk} \right)$$
2. Step-by-Step Example
Let’s solve for $A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$. We need to find $L = \begin{pmatrix} L_{11} & 0 \cr L_{21} & L_{22} \end{pmatrix}$.
Step 1: Calculate $L_{11}$ (The first diagonal)
$$L_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$$Step 2: Calculate $L_{21}$ (The first column, second row)
$$L_{21} = \frac{a_{21}}{L_{11}} = \frac{12}{2} = 6$$Step 3: Calculate $L_{22}$ (The second diagonal)
$$L_{22} = \sqrt{a_{22} – L_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = \sqrt{9} = 3$$
3. Final Result
The resulting Lower Triangular Matrix $L$ is:
$$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$
4. AI Learner’s Summary Table
Step Type Goal Intuition for AI Diagonal $L_{ii}$ Accounting for the “variance” left in the row. Off-Diagonal $L_{ij}$ Determining the “correlation” or weight between dimensions. Constraint $a_{ii} > \sum L_{ik}^2$ Ensuring the matrix is Positive-Definite (no imaginary numbers). Solving $Ax = b$ using this $L$ is much more efficient in neural network optimizations because you can use simple “Forward Substitution” to find your weights.
How to calculate Lower Triangular Matrix in Cholesky Decomposition?
In AI fields like Gaussian Processes or Kalman filtering, the Cholesky Decomposition is a cornerstone for handling covariance matrices. It decomposes a symmetric, positive-definite matrix $A$ into $A = LL^T$, where $L$ is a lower triangular matrix.
1. The Derivation of Cholesky Formulas
To derive the formulas, we start by setting up the matrix multiplication for $A = LL^T$. Let’s look at the elements of $A$ resulting from the product of $L$ and its transpose $L^T$:
$$A = \begin{pmatrix} L_{11} & 0 & 0 \cr L_{21} & L_{22} & 0 \cr L_{31} & L_{32} & L_{33} \end{pmatrix} \begin{pmatrix} L_{11} & L_{21} & L_{31} \cr 0 & L_{22} & L_{32} \cr 0 & 0 & L_{33} \end{pmatrix}$$
By the definition of matrix multiplication, the element $a_{ij}$ is the dot product of the $i$-th row of $L$ and the $j$-th column of $L^T$ (which is the $j$-th row of $L$):
$$a_{ij} = \sum_{k=1}^{n} L_{ik} (L^T){kj} = \sum_{k=1}^{n} L_{ik} L_{jk}$$Since $L$ is lower triangular ($L_{ik} = 0$ for $k > i$), the summation only goes up to the minimum of $i$ and $j$. For $i \ge j$:
$$a_{ij} = \sum_{k=1}^{j} L_{ik} L_{jk}$$A. Deriving Diagonal Elements ($i = j$)
When $i = j$, the equation becomes:
$$a_{ii} = L_{i1}^2 + L_{i2}^2 + \dots + L_{ii}^2$$
Isolating $L_{ii}$, we get:
$$L_{ii}^2 = a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2 \implies L_{ii} = \sqrt{a_{ii} – \sum_{k=1}^{i-1} L_{ik}^2}$$B. Deriving Off-Diagonal Elements ($i > j$)
When $i > j$, we expand the sum:
$$a_{ij} = (L_{i1}L_{j1} + L_{i2}L_{j2} + \dots + L_{i,j-1}L_{j,j-1}) + L_{ij}L_{jj}$$
Isolating $L_{ij}$, we get:
$$L_{ij} = \frac{1}{L_{jj}} \left( a_{ij} – \sum_{k=1}^{j-1} L_{ik} L_{jk} \right)$$
2. Practical Calculation Example
Consider $A = \begin{pmatrix} 4 & 12 \cr 12 & 45 \end{pmatrix}$. We want to find $L = \begin{pmatrix} L_{11} & 0 \cr L_{21} & L_{22} \end{pmatrix}$.
Step 1: Calculate $L_{11}$
Using the diagonal formula:
$$L_{11} = \sqrt{a_{11}} = \sqrt{4} = 2$$Step 2: Calculate $L_{21}$
Using the off-diagonal formula:
$$L_{21} = \frac{a_{21}}{L_{11}} = \frac{12}{2} = 6$$Step 3: Calculate $L_{22}$
Using the diagonal formula again:
$$L_{22} = \sqrt{a_{22} – L_{21}^2} = \sqrt{45 – 6^2} = \sqrt{45 – 36} = \sqrt{9} = 3$$Result:
$$L = \begin{pmatrix} 2 & 0 \cr 6 & 3 \end{pmatrix}$$
3. Summary for AI Learners
Concept Importance in AI Numerical Stability Avoids the errors often found in standard matrix inversion. Computational Speed Roughly $O(\frac{1}{3}n^3)$, twice as fast as LU decomposition. Sampling To sample from $N(\mu, \Sigma)$, we use $x = \mu + Lz$ where $z \sim N(0, I)$. By understanding this derivation, you can see why the matrix $A$ must be Positive-Definite: the value inside the square root ($a_{ii} – \sum L_{ik}^2$) must be positive for $L$ to remain in the real number domain.
-
This reply was modified 3 months ago by
Wolf.
-
This reply was modified 3 months ago by
Wolf.
-
This reply was modified 3 months ago by
Wolf.
-
This reply was modified 3 months ago by
Wolf.
-
This reply was modified 3 months ago by
Wolf.
-
AuthorPosts
- You must be logged in to reply to this topic.
