\( \def\bbR{\mathbb{R}} \def\rmI{\mathrm{I}} \def\trace{\mathrm{trace}} \)
I have been recently reading about the Schur complement, which is a cool mathematical tool that I had originally come across in a previous course, specifically when dealing with conditionals of multivariate Gaussian distributions. I mostly took the formula for granted, but this time around I thought I read a little more about this, and apply them to some mathematical puzzles that I recently came across.
Let \(M\) be a \(d \times d\) real valued square matrix, which has a block structure like so: \[M = \begin{bmatrix} A & B \\ C & D \end{bmatrix}\] where \(A \in \bbR^{d_{1} \times d_{1}}\), \(B \in \bbR^{d_{1} \times d_{2}}\), \(C \in \bbR^{d_{2} \times d_{1}}\), and \(D \in \bbR^{d_{2} \times d_{2}}\) and \(d_{1} + d_{2} = d\). The matrix \(M\) need not be symmetric; hence the choice of \(B\) and \(C\) for the off-diagonal blocks.
Def. Schur complement: The Schur complements of \(M\) are \begin{align*} M/A &:= D - CA^{-1}B \\ M/D &:= A - BD^{-1}C \end{align*} When either \(A\) or \(D\) is not invertible, then the Schur complement is considered to be undefined*.
Why is the Schur complement a "natural" object to consider. Borrowing inspiration from the note of Jean Gallier, suppose we are interested in solving a linear system defined as \[M \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} c_{1} \\ c_{2}\end{bmatrix}\] where \(x, c_{1} \in \bbR^{d_{1}}\) and \(y, c_{2} \in \bbR^{d_{2}}\). This can be expressed equivalently in terms of the block matrices as \[Ax + By = c_{1}~,\quad Cx + Dy = c_{2}~.\] When given access to the inverse of \(A\), one could multiply the first equation by \(A^{-1}\), and substitute the solution for \(x\) in the second equation to obtain \[\underbrace{(D - CA^{-1}B)}_{(M/A)^{-1}}y = c_{2} - CA^{-1}c_{1}~.\] Rather if given access to the inverse of \(D\), one could multiply the second equation by \(D^{-1}\), and substitute the solution for \(y\) in the first equation to obtain \[\underbrace{(A - BD^{-1}C)}_{(M/D)^{-1}}x = c_{1} - D^{-1}c_{2}~.\] This also indicates that the Schur complement would make an appearance in the inverse of \(M\) as the process above is essentially Gaussian elimination. In fact, one can derive the following block decomposition of \(M\). To see this, we require transforming \(M\) into a purely block diagonal matrix. This can be achieved by two pairs of operations: \begin{gather*} \begin{bmatrix} A & B \\ C & D \end{bmatrix} \xrightarrow{\substack{R_{1} = R_{1} \\ R_{2} = R_{2} - CA^{-1} R_{1}}} \begin{bmatrix} A & B \\ 0 & D - CA^{-1}B \end{bmatrix} \xrightarrow{\substack{R_{2} = R_{2} \\ R_{1} = R_{1} - B(M/A)^{-1}R_{2}}} \begin{bmatrix} A & 0 \\ 0 & D - CA^{-1}B \end{bmatrix} \\ \begin{bmatrix} A & B \\ C & D \end{bmatrix} \xrightarrow{\substack{R_{2} = R_{2} \\ R_{1} = R_{1} - BD^{-1} R_{2}}} \begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix} \xrightarrow{\substack{R_{1} = R_{1} \\ R_{2} = R_{2} - C(M/D)^{-1}R_{1}}} \begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix} \\ \end{gather*} Noting that row operations are equivalent to premultiplication, it is possible to directly read off the inverse of \(M\) in two forms, which stated in the proposition below.
Prop. Inverse of M: Assuming that one of \(A\) and \(D\) is invertible, we have either \begin{align*} M^{-1} &= \begin{bmatrix} A^{-1} & 0 \\ 0 & (M/A)^{-1} \end{bmatrix} \begin{bmatrix} I & -B(M/A)^{-1}\\ 0 & I \end{bmatrix} \begin{bmatrix} I & 0 \\ -CA^{-1} & I \end{bmatrix}= \begin{bmatrix} A^{-1} + A^{-1}B(M/A)^{-1}CA^{-1} & -A^{-1}B(M/A)^{-1} \\ -(M/A)^{-1}CA^{-1} & (M/A)^{-1} \end{bmatrix} \\ &= \begin{bmatrix} (M/D)^{-1} & 0 \\ 0 & D^{-1} \end{bmatrix} \begin{bmatrix} I & -BD^{-1}\\ 0 & I \end{bmatrix} \begin{bmatrix} I & 0 \\ -C(M/D)^{-1} & I \end{bmatrix}= \begin{bmatrix} (M/D)^{-1} & -(M/D)^{-1}BD^{-1} \\ -D^{-1}C(M/D)^{-1} & D^{-1} + D^{-1}C(M/D)^{-1}BD^{-1} \end{bmatrix} \end{align*} A remark about the Schur complement is that given access to \(A^{-1}\) or \(D^{-1}\), the cost of computing the inverse of \(M\) could potentially be much smaller than directly inverting \(M\).
In situations where we are interested in knowing about the positive semi-definiteness (PSD) of \(M\), the Schur complement can be handy. This is due to the following theorem. See the note for what PSD means if you're unfamiliar.
Thm. Schur Complement PD: Let \(M\) be a matrix in the block form as described above.
Puzzle 1a: Assume that \(M\) is a block matrix as described above. What is the sum of eigenvalues of \(M^{-1}\) in terms of \(A, B, C, D\)?
A solution: Note that the trace of a matrix is the sum of its eigenvalues. Consequently, the puzzle is asking for the trace of \(M^{-1}\). From the Schur complement formula above, one can that the trace of \(M^{-1}\) is \[\trace (M/A)^{-1} + \trace A^{-1} + \trace A^{-1}B(M/A)^{-1}CA^{-1} \quad \text{or} \quad \trace (M/D)^{-1} + \trace D^{-1} + \trace D^{-1}C(M/D)^{-1}BD^{-1}~.\]
Puzzle 1b: Suppose \(M = \begin{bmatrix} z & y & y \\ y & x & 0 \\ y & 0 & x \end{bmatrix}\). Compute the trace of \(M^{-1}\) as a function of \(x, y, z\).
A solution: It is very tempting to compute the eigenvalues of \(M\), especially considering that it is a 3-by-3 matrix. However, this can prove to be very computationally intensive, and in fact gives us a lot more information that what we want. Consider the blocks \(A = [z]\), \(B = [y, y]\), \(C = [y, y]^{\top}\), and \(D = x \cdot \rmI_{2}\). Then, \(M/D = A - BD^{-1}C = [z - \frac{y^{2}}{x}]\). Using the formula above, \[\trace M = \left(z - \frac{y^{2}}{x}\right)^{-1} + \frac{2}{x} + 2\left(\frac{y^{2}}{x^{2}}\right) \cdot \left(z - \frac{y^{2}}{x}\right)^{-1}~.\]
Puzzle 2: Consider three random variables \(X, Y, Z\) such that the \(\mathrm{corr}(X, Y) = \rho_{XY}\) and \(\mathrm{corr}(Y, Z) = \rho_{YZ}\). What is the range of values for \(\lambda := \mathrm{corr}(X, Z)\)?
At first glance, this might appear to be a pure statistical question, but it is possible to translate this into question involving matrices. Since the correlation is invariant to scaling of the random variables, let us assume their variances to be \(1\). In this case, the covariance matrix of \(X, Y, Z\) is given by \[\Sigma = \begin{bmatrix} 1 & \rho_{XY} & \lambda \\ \rho_{XY} & 1 & \rho_{YZ} \\ \lambda & \rho_{YZ} & 1\end{bmatrix}~.\] While this might look like its not helping, we can rearrange the matrix to instead be the covariance matrix of \(Z, X, Y\) like so \[\Sigma = \begin{bmatrix} 1 & \lambda & \rho_{XY} \\ \lambda & 1 & \rho_{YZ} \\ \rho_{XY} & \rho_{YZ} & 1 \end{bmatrix}~.\] A valid range for \(\lambda\) should also ensure that the matrix \(\Sigma\) is PSD, since it is a covariance matrix. Using the Schur complement theorem, and the crucial fact that \([1]\) is PD, we have that \(\Sigma\) is PSD iff \(\Sigma / [1]\) is PSD. The Schur complement \(\Sigma / [1]\) is given by \[\begin{bmatrix} 1 & \lambda \\ \lambda & 1 \end{bmatrix} - \begin{bmatrix} \rho_{XY}^{2} & \rho_{XY}\rho_{YZ} \\ \rho_{XY}\rho_{YZ} & \rho_{YZ}^{2}\end{bmatrix} = \begin{bmatrix} 1 - \rho_{XY}^{2} & \lambda - \rho_{XY}\rho_{YZ} \\ \lambda - \rho_{XY}\rho_{YZ} & 1 - \rho_{YZ}^{2} \end{bmatrix}~.\] This is a convenient 2-by-2 matrix, whose trace is non-negative since \(1 - \rho_{XY}^{2}\) and \(1 - \rho_{YZ}^{2}\) are individually non-negative. For its eigenvalues to be non-negative, it remains to enforce the condition that the determinant is non-negative (Why? Check this note out). This condition is equivalent to \[(1 - \rho_{XY}^{2})(1 - \rho_{YZ}^{2}) - (\lambda - \rho_{XY}\rho_{YZ})^{2} \Leftrightarrow |\lambda - \rho_{XY}\rho_{YZ}| \leq \sqrt{(1 - \rho_{XY}^{2})(1 - \rho_{YZ}^{2})}~.\]