Positive semi-definiteness of a 2x2 matrix

\( \def\rmd{\mathrm{d}} \def\bbP{\mathbb{P}} \def\rmI{\mathrm{I}} \def\calN{\mathcal{N}} \)

Here is an interesting mathematical puzzle that I recently came across.

Suppose we have a randomly generated symmetric, \(2 \times 2\) matrix, whose entries are independently drawn from a uniform distribution over \([-1, 1]\). What is the likelihood of this matrix being positive semi-definite?

Let \(M \in \mathbb{R}^{d \times d}\) be a symmetric matrix. The eigenvalues of \(M\) are the roots of the characteristic polynomial, which is succinctly defined by the expression \(\det(M - x \rmI_{d}) = 0~.\) Let \(U\Lambda U^{\top} = M\) be the eigendecomposition of \(M\). By the product property of determinants, and the fact that \(\det(U) = \pm 1\), we have \[\det(U^{\top}MU) = \det(M), \quad \det(M - x\rmI_{d}) = \det(\Lambda - x\rmI_{d})~.\] This proves that the product of eigenvalues is the determinant, and by Vieta's formula, the sum of eigenvalues is the trace.

When \(d = 2\), we only have to care about two eigenvalues. Indeed, it is even possible to exactly compute the eigenvalues in closed form using the characteristic equation.

Let \(M = \begin{bmatrix} a & b \\ b & c \end{bmatrix}\). Then, the eigenvalues \(\lambda_{1}, \lambda_{2}\) of \(A\) are solutions to the quadratic equation defined as \[\det (A - \lambda I) = (\lambda - a)(\lambda - c) - b^{2} = 0~.\] Note that the sum and product of the roots of this quadratic equation are \(a + c\) and \(ac - b^{2}\) respectively, and this which verifies the fact that the trace and determinant of \(A\) are given by the sum and product of the eigenvalues respectively.

Def PSD: A symmetric matrix \(M \in \mathbb{R}^{d \times d}\) is said to be positive semidefinite (abbrev. PSD) if all of \(d\) eigenvalues (which are real) are non-negative. Equivalently, for any vector \(v \in \mathbb{R}^{d}\), \(v^{\top}Mv \geq 0\).

We are interested in \(\bbP_{a,b,c}(\{\lambda_{1} \geq 0\} \cap \{\lambda_{2} \geq 0\})\). The brute force way to compute this is to compute closed form expressions of the roots, and integrate over the intersection of the two regions defined by \(\lambda_{1} \geq 0\) and \(\lambda_{2} \geq 0\) which are inequalities in terms of \(a, b, c\). An alternate way to approach this is by using the fact below.

Fact: Let \(x, y \in \mathbb{R}\). \(x \geq 0\) and \(y \geq 0\) if and only if \(x + y \geq 0\) and \(xy \geq 0\).

Proof: If \(x, y \geq 0\), then \(x + y \geq 0\) and \(xy \geq 0\), which proves one direction of the equivalence. Instead, assume that \(x + y \geq 0\) and \(xy \geq 0\). The case \(xy \geq 0\) implies that \(x, y\) are of the same sign, and therefore \(x + y\) has the same sign as \(x\) and \(y\). Since \(x + y \geq 0\), we have that \(x, y \geq 0\). \(\square\)

The above fact implies that \(\bbP_{a,b,c}(\{\lambda_{1} \geq 0\} \cap \{\lambda_{2} \geq 0\})\) is equal to \(\bbP_{a,b,c}(\{a + c \geq 0\} \cap \{ac - b^{2} \geq 0\})\).

This probability can be expressed as a closed form as shown below. Let \(p(a, b, c)\) denote the density function of the joint distribution of \(a, b, c\). \[\iiint_{\substack{a + c \geq 0 \\ ac - b^{2} \geq 0}} p(a, b, c)~\rmd a~\rmd b~\rmd c = \iiint_{\substack{a + c \geq 0 \\ ac - b^{2} \geq 0 \\ ac \geq 0}} p(a, b, c)~\rmd a~\rmd b~\rmd c + \iiint_{\substack{a + c \geq 0 \\ ac - b^{2} \geq 0 \\ ac \lt 0}} p(a, b, c)~ \rmd a~\rmd b~\rmd c.\] The second integral is zero, since \(ac \lt 0\) and \(b^{2} \leq ac\) are incompatible relations. This leaves us with the evaluation of the first integral, and the fact stated previously comes in handy here as well. Specifically, two of the constraints defined the region of interest: \(\{ac \geq 0\}\) and \(\{a + c \geq 0\}\), can be equivalently be cast as \(a \geq 0\) and \(c \geq 0\). Therefore, \[\iiint_{\substack{a + c \geq 0 \\ ac - b^{2} \geq 0 \\ ac \geq 0}} p(a, b, c)~ \rmd a~\rmd b~\rmd c=\int_{0}^{1} \int_{0}^{1} \int_{-\sqrt{ac}}^{\sqrt{ac}} \frac{1}{8} \rmd b~\rmd a~\rmd c=\frac{1}{4}\int_{0}^{1}\int_{0}^{1}\sqrt{ac}~\rmd a~\rmd c=\frac{1}{4}\left(\int_{0}^{1}\sqrt{x}~\rmd x\right)^{2}=\boxed{\frac{1}{9}}~.\]

Suppose \(a, b, c\) are drawn independently from a uniform distribution over a different range, then how is the likelihood of \(M\) being PSD affected?

To generalise this to bounds of the form \([\ell, u]\) for \(\ell \lt u\), we revisit the region of integration which we denote by \(R\). \[R = \left\{(a, b, c) : \begin{cases} a \geq 0,~a \in [\ell, u] \\ c \geq 0,~c \in [\ell, u] \\ b \in [-\sqrt{ac}, \sqrt{ac}] \cap [\ell, u] \end{cases}\right\}~.\] This allows to make certain observations.

If \(\ell = -u\) for \(u \gt 0\), then the region simplifies to the \([-1, 1]\) setting with a reparameterisation \(\bar{a} = \frac{a}{u}, \bar{b} = \frac{b}{u}, \bar{c} = \frac{c}{u}\)
If \(u \leq 0\), then \(R = \emptyset\). This is expected as the trace of \(M\) would be negative, which implies at least one negative eigenvalue.
If \(\ell \geq 0\), then \(R = \{(a, b, c) : a \in [\ell, u], b \in [\ell, \sqrt{ac}], c \in [\ell, u]\}\) since \(\sqrt{ac} \leq u\), and \(-\sqrt{ac} \leq \ell\)
If \(\ell \leq 0\) and \(u \geq 0\), then we have two more cases.
- Suppose \(|\ell| \leq u\). Then \(R = \{(a, b, c) : a, c \in [0, u], b \in [-\sqrt{ac}, \sqrt{ac}] \cap [-|\ell|, |\ell|]\} \cup \{(a, b, c) : a, c \in [0, u], b \in [-\sqrt{ac}, \sqrt{ac}] \cap [|\ell|, u]\}\).
- Otherwise, \(|\ell| \gt u\). Then \(R = \{(a, b, c) : a, c \in [0, u], b \in [-\sqrt{ac}, \sqrt{ac}] \cap [-u, u]\} \cup \{(a, b, c) : a, c \in [0, u], b \in [-\sqrt{ac}, \sqrt{ac}] \cap [\ell, -u]\}\).

It would also be interesting to think about the case where distribution is non-uniform. A "simple" non-uniform distribution is the standard normal distribution, and so it would be interested to see the effect of this distribution on the likelihood.

Let \(\phi : \mathbb{R} \to (0, \infty)\) denote the probability density function of a standard normal variable, and \(\Phi : \mathbb{R} \to (0, 1)\) denote the cumulative distribution function which satisfies \(\Phi(x) = \int_{-\infty}^{x}\phi(x) \rmd x\). The likelihood of the event we are interested in is \[\bbP_{a,b,c}(\{a + c \geq 0\} \cap \{ac - b^{2} \geq 0\}) = \iiint_{\substack{a + c \geq 0 \\ ac - b^{2} \geq 0 \\ ac \geq 0}} \phi(a) \phi(b) \phi(c)~\rmd a~\rmd b~\rmd c = \int_{0}^{\infty}\int_{0}^{\infty}\int_{-\sqrt{ac}}^{\sqrt{ac}} \phi(b)\phi(a)\phi(c)~\rmd b~\rmd a~\rmd c~ = \underbrace{2\int_{0}^{\infty}\int_{0}^{\infty}(\Phi(\sqrt{ac}) - \Phi(0))~\phi(a)\phi(c)~\rmd a~\rmd c}_{I}~.\] The above integral cannot be computed in closed form owing to the non-linearity from \(\Phi\). One strategy to give a bound on this probability is highlighted below, which uses a truncated linear approximation to \(\Phi\). Note that for any \(t \in \mathbb{R}\), \(\Phi'(t) = \phi(t) \leq \frac{1}{\sqrt{2\pi}}\). Consequently, from first-order Taylor approximation, \(\Phi(\sqrt{ac}) - \Phi(0) = \Phi'(t)(\sqrt{ac}) \leq \frac{\sqrt{ac}}{\sqrt{2\pi}}\) for some \(t \in (0, \sqrt{ac})\). This is a linear extrapolation, but recall that \(\Phi(\sqrt{ac}) - \Phi(0) \leq \frac{1}{2}\), and therefore, a tighter upper bound would be \[\Phi(\sqrt{ac}) - \Phi(0) \leq \min\left\{\frac{\sqrt{ac}}{\sqrt{2\pi}}, \frac{1}{2}\right\}~.\] Hence, when \(a, c \leq \sqrt{\frac{\pi}{2}}\), \(ac \leq \frac{\pi}{2}\). This yields \[\int_{0}^{\infty}\int_{0}^{\infty}(\Phi(\sqrt{ac}) - \Phi(0))\phi(a)\phi(c)~\rmd a~\rmd c \leq \underbrace{\int_{0}^{\sqrt{\frac{\pi}{2}}}\int_{0}^{\sqrt{\frac{\pi}{2}}} \frac{\sqrt{ac}}{\sqrt{2\pi}} \cdot \phi(a)\phi(c)~\rmd a~\rmd c}_{I_{1}} + 2\underbrace{\int_{\sqrt{\frac{\pi}{2}}}^{\infty}\int_{0}^{\sqrt{\frac{\pi}{2}}} \frac{1}{2} \cdot \phi(a)\phi(c)~\rmd a~\rmd c}_{I_{2}} + \underbrace{\int_{\sqrt{\frac{\pi}{2}}}^{\infty}\int_{\sqrt{\frac{\pi}{2}}}^{\infty} \frac{1}{2} \cdot \phi(a)\phi(c)~\rmd a~\rmd c}_{I_{3}}~.\]

Now we compute each of the \(I_{i}\) integrals. \[I_{1} = \int_{0}^{\sqrt{\frac{\pi}{2}}}\int_{0}^{\sqrt{\frac{\pi}{2}}} \frac{\sqrt{ac}}{\sqrt{2\pi}} \cdot \phi(a)\phi(c)~\rmd a~\rmd c = \frac{1}{\sqrt{2\pi}} \cdot \left(\int_{0}^{\sqrt{\frac{\pi}{2}}}\sqrt{x} \phi(x)~\rmd x\right)^{2} \approx 0.0299.\] \[I_{2} = \int_{\sqrt{\frac{\pi}{2}}}^{\infty}\int_{0}^{\sqrt{\frac{\pi}{2}}} \frac{1}{2} \cdot \phi(a)\phi(c)~\rmd a~\rmd c = \frac{1}{2} \cdot \left(\Phi\left(\sqrt{\frac{\pi}{2}}\right) - \frac{1}{2}\right) \cdot \left(1 - \Phi\left(\sqrt{\frac{\pi}{2}}\right)\right) \approx 0.0208~.\] \[I_{3} = \int_{\sqrt{\frac{\pi}{2}}}^{\infty}\int_{\sqrt{\frac{\pi}{2}}}^{\infty} \frac{1}{2} \cdot \phi(a)\phi(c)~\rmd a~\rmd c = \frac{1}{2} \cdot \left(1 - \Phi\left(\sqrt{\frac{\pi}{2}}\right)\right)^{2} \approx 0.0055~.\] The upper bound for the integral is therefore \[I = 2 \int_{0}^{\infty}\int_{0}^{\infty} (\Phi(\sqrt{ac}) - \Phi(0))\phi(a)\phi(c)~\rmd a~\rmd c \leq 0.154 ~.\]

How tight is this upper bound? It is possible to estimate the integral by performing a Monte Carlo estimation of the probability of this event and the double integral. With \(7\) independent runs, and each run using \(10^{6}\) samples, we obtain the following estimates for the probability and the integral.

Quantity	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Average with std. dev.
Probability of the event:	\(0.1156\)	\(0.1158\)	\(0.1159\)	\(0.1151\)	\(0.1158\)	\(0.1164\)	\(0.1159\)	\(0.1158 \pm 0.0004\)
Integral expression \(I\):	\(0.1162\)	\(0.1159\)	\(0.1159\)	\(0.1162\)	\(0.1157\)	\(0.1159\)	\(0.1162\)	\(0.1159 \pm 0.0002\)

This shows that the upper bound is somewhat tight, and could be improved by considering higher order Taylor series approximations; \(x \mapsto \Phi^{(k)}(x)\) is bounded from above for odd integers \(k\) for \(x \geq 0\).

There are some interesting points to note from this slightly more general analysis performed above.

Suppose the elements \(a, b, c\) are drawn independently from a zero mean Gaussian with variance \(\sigma^{2}\) that is not necessarily \(1\). Then, using a reparameterisation as \(\bar{a} \leftarrow a \cdot \sigma, \bar{b} \leftarrow b \cdot \sigma, \bar{c} \leftarrow c \cdot \sigma\), the likelihood of the matrix being PSD is unaffected, which echoes the case for when the elements \(a, b, c\) are independently drawn uniformly from a interval that is symmetric around \(0\).
The expression for the probability as a double integral in \(I\) only relied on the fact that the cumulative distribution function \(\Phi\) of the standard normal is symmetric around \(0\). Therefore, this closed form expression is still valid for any distribution function that is symmetric around \(0\). Indeed, the upper bound relied on the fact that \(\Phi\) is the CDF of the standard normal, but it is possible to specialise this for specific distributions.