# The Multivariate Gaussian For Programmers

October 21, 2020
Mark Liu

The multivariate gaussian is a crucial building block in machine learning, robotics, and mathematical finance applications. Here is just a small sample of applications.

• Kalman Filters
• Gaussian Mixture Models
• Linear Discriminant Analysis
• Linear Quadratic Regulators
• Black Scholes Option Pricing

However, probability density formula of a multivariate gaussian can be pretty intimidating.
$p(x) = \frac 1 {2 \pi}^{k/2} \det(\Sigma)^{-1/2} e^{-\frac 1 2 (x - \mu)^T \Sigma^{-1} (x - \mu)}$

While this expression may work for a mathematician, it can be a little overwhelming for other audiences. In this post I’ll provide an alternate rigorous, programmatic, definition of the multivariate gaussian in Python. We’ll then be able to use our mental Python interpreter to get intuition about the behavior of any multivariate gaussian.

I will assume familiarity with the following concepts.

• the single dimensional gaussian, aka the normal random variable
• expectation and covariance of random variables
• vector and matrix operations, such as multiplication, and transpose
• basic Python and numpy notation

We’ll also need the concept of simulation. I’ll say that a function f simulates a random variable F if there is no way to determine, given a list of outputs, whether the list was generated by repeated function calls to f or whether it came from independent realizations of F.

Now we’re ready to give a computational definition of an $n$ dimensional multivariate gaussian.

Examine this simple program.

import numpy as np

n = ... # Fill this in with a positive integer dimension, say n = 3

def z():
"""
Return an n-dimensional column vector of independent standard random normals
"""
return np.random.normal(size=(n,1))

def x(mu, M):
"""
Return a random n-dimensional vector according to our special formula

mu -- an n-dimensional column vector
M -- an nxn matrix
"""
return mu + np.matmul(M, z())

Definition. $X$ is an $n$-dimensional multivariate gaussian if and only if its distribution can be simulated by repeatedly calling the above function x(mu, M) with some constant inputs mu and M.

Now let’s break down what x(mu, M) does.

1. Fill an n-dimensional vector with independent standard random normals by calling z().
2. Then multiply the result by matrix M.
3. Then shift the result by mu.

This computational viewpoint is especially useful if M has a geometrical interpretation. For example, suppose we are working in two dimensions $n = 2$. If we graph the two columns of M, we can get a picture of how it warps the plane.

Successive calls to z() will generate a cloud of samples around the origin, with the cloud getting denser as we get closer to the origin. If we draw a circle around this cloud and multiply all points in the plane by M, we can see that we get a cloud in an elliptical shape.

All that remains is to shift this cloud of points by mu and we have a visualization for the two dimensional multivariate gaussian simulated by x(mu, M).

This ellipsoid representation, by the way, is exactly what we use in the self driving car industry to represent the uncertainty estimates for the positions of other cars.

Suppose we know that $X$ is simulated by x(mu, M) (and therefore it is, by our definition, multivariate gaussian) but we want to convert to the popular mean $\mu$ and covariance $\Sigma$ parameterization. It’s a simple matter of computing statistics.

First let’s compute the mean.
$\mu = E[X]$ $= E[$x(mu, M)$]$
$= E[$mu + M z()$] =$mu + M $E[$z()$]$ (factoring out constants)
$=$mu + M $0$ (using $E[$z()$] = 0$)
$=$mu

Unsurprisingly, the mean $\mu$ is mu. We could have guessed that by looking at the illustrations above.

Now let’s compute the covariance. But first a clarification about notation. Consider that np.matmul(z(), z().T) calls z twice and therefore generates two independent versions of z. How do we declare that we want the same sample of z both times? We can use an immediately invoked lambda expression (lambda Z: np.matmul(Z Z.T))(z()). (You might have seen this elsewhere called the IIFE pattern.) Okay we’re ready to compute the covariance.

$\Sigma = E[(X - \mu)(X - \mu)^T]$
$= E[$(lambda X: (X - mu)(X - mu).T)(x(mu, M))$]$
$= E[$(lambda Z: M Z (M Z).T)(z())$]$
$= E[$(lambda Z: M Z Z.T M.T)(z())$]$
$=$ M $E[$ (lambda Z: Z Z.T)(z()) $]$ M.T
$=$ M $I$ M.T (using $E[$ (lambda Z: Z Z.T)(z()) $] = I$ — diagonal terms are variances and cross terms are covariances of standard independent normals)
$=$ M M.T

So the covariance $\Sigma$ is M M.T and we get this result.

Result 1. If $X$ is a gaussian simulated by x(mu, M), it has mean mu and covariance M M.T.

It’s actually more likely you’ll need to go backwards. For example, if you encounter a multivariate gaussian in the wild with the $\mu$, $\Sigma$ parameterization and you need want to visualize it as an ellipsoid using the mu, M parameterization. For that we usually use the eigendecomposition $\Sigma = Q \Lambda Q^T$

This decomposition is extremely important in linear algebra, and I would encourage you to read up on it later. If we start with any covariance matrix $\Sigma$, the eigendecomposition has the following properties.

• $\Lambda$ is a diagonal matrix with positive entries
• $Q$ is a high dimensional rotation matrix (all columns are mutually orthogonal)

Defining $\Lambda^{\frac 1 2}$ as the diagonal matrix whose entries are the square root of $\Lambda$’s entries, we can see from Result 1 that calling x(mu=$\mu$, M=$Q \Lambda^{\frac 1 2}$) will simulate a multivariate gaussian with mean $\mu$ and covariance M M.T $= Q \Lambda^{\frac 1 2} (Q \Lambda^{\frac 1 2})^T = Q \Lambda Q^T = \Sigma$. So we have the following result.

Result 2. A multivariate gaussian $X$ with mean $\mu$ and covariance $\Sigma$ is simulated by x(mu=$\mu$, M=$Q \Lambda^{\frac 1 2}$) where $Q \Lambda Q^T$ is the eigendecomposition of $\Sigma$.

Let’s break down the action of $Q \Lambda^{\frac 1 2}$ into two steps, the action of $\Lambda^{\frac 1 2}$ and then the action of $Q$.

Remember $\Lambda^{\frac 1 2}$ is a diagonal matrix so it acts in an extremely simple way — by stretching the $i^{th}$ axis by the scaling factor given in its $i^{th}$ diagonal entry.
$\Lambda^{\frac 1 2} x = \begin{bmatrix} \sigma_1 && 0 && 0 \\ 0 && \sigma_2 && 0 \\ 0 && 0 && \sigma_3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} \sigma_1 x_1 \\ \sigma_2 x_2 \\ \sigma_3 x_3 \end{bmatrix}$

The matrix $Q$ is a rotation matrix whose action is also easily interpretable from inspecting its columns using column oriented multiplication.

So using our mental Python interpreter as before, we can visualize the samples x(mu=$\mu$, M=$Q \Lambda^{\frac 1 2}$) by imagining a cloud of samples contained in a sphere, then warping the sphere into an ellipsoid whose axes lengths are given by $\Lambda^{\frac 1 2}$, and then shifting this whole ellipsoid by $\mu$.

If you are fortunate enough to be working at a robotics company, you can do a code search for your gaussian visualization routines and I bet you will find something like the following.

def vis_gaussian(mean, covar):
"""Visualize a 3d gaussian"""
eigen_vals, eigen_vects = np.linalg.eig(covar)
plot_ellipsoid(
center=mean,
major_axes=eigen_vects,
axes_lengths=np.sqrt(eigen_vals))

Now I hope whenever you see $X \sim N(\mu, \Sigma)$ you’ll find it helpful to mentally replace it with $X =$x(mu=$\mu$, M=$Q \Lambda^{\frac 1 2}$) and simulate it with your mental Python interpreter.

© 2020 Biro Inc.