The Multivariate Gaussian For Programmers

October 21, 2020
Mark Liu

The multivariate gaussian is a crucial building block in machine learning, robotics, and mathematical finance applications. Here is just a small sample of applications.

  • Kalman Filters
  • Gaussian Mixture Models
  • Linear Discriminant Analysis
  • Linear Quadratic Regulators
  • Black Scholes Option Pricing

However, probability density formula of a multivariate gaussian can be pretty intimidating.
p(x)=12πk/2det(Σ)1/2e12(xμ)TΣ1(xμ)p(x) = \frac 1 {2 \pi}^{k/2} \det(\Sigma)^{-1/2} e^{-\frac 1 2 (x - \mu)^T \Sigma^{-1} (x - \mu)}

While this expression may work for a mathematician, it can be a little overwhelming for other audiences. In this post I’ll provide an alternate rigorous, programmatic, definition of the multivariate gaussian in Python. We’ll then be able to use our mental Python interpreter to get intuition about the behavior of any multivariate gaussian.

I will assume familiarity with the following concepts.

  • the single dimensional gaussian, aka the normal random variable
  • expectation and covariance of random variables
  • vector and matrix operations, such as multiplication, and transpose
  • basic Python and numpy notation

We’ll also need the concept of simulation. I’ll say that a function f simulates a random variable F if there is no way to determine, given a list of outputs, whether the list was generated by repeated function calls to f or whether it came from independent realizations of F.

Now we’re ready to give a computational definition of an nn dimensional multivariate gaussian.

Examine this simple program.

import numpy as np

n = ... # Fill this in with a positive integer dimension, say n = 3

def z():
    """
    Return an n-dimensional column vector of independent standard random normals
    """
    return np.random.normal(size=(n,1))
    
def x(mu, M):
    """
    Return a random n-dimensional vector according to our special formula
    
    mu -- an n-dimensional column vector
    M -- an nxn matrix
    """
    return mu + np.matmul(M, z())

Definition. XX is an nn-dimensional multivariate gaussian if and only if its distribution can be simulated by repeatedly calling the above function x(mu, M) with some constant inputs mu and M.

Now let’s break down what x(mu, M) does.

  1. Fill an n-dimensional vector with independent standard random normals by calling z().
  2. Then multiply the result by matrix M.
  3. Then shift the result by mu.

This computational viewpoint is especially useful if M has a geometrical interpretation. For example, suppose we are working in two dimensions n=2n = 2. If we graph the two columns of M, we can get a picture of how it warps the plane.

action of matrix M as a warping of the plane

Successive calls to z() will generate a cloud of samples around the origin, with the cloud getting denser as we get closer to the origin. If we draw a circle around this cloud and multiply all points in the plane by M, we can see that we get a cloud in an elliptical shape.

action of matrix M as a warping of the cloud z()

All that remains is to shift this cloud of points by mu and we have a visualization for the two dimensional multivariate gaussian simulated by x(mu, M).

shifting_cloud_by_mu

This ellipsoid representation, by the way, is exactly what we use in the self driving car industry to represent the uncertainty estimates for the positions of other cars.

Suppose we know that XX is simulated by x(mu, M) (and therefore it is, by our definition, multivariate gaussian) but we want to convert to the popular mean μ\mu and covariance Σ\Sigma parameterization. It’s a simple matter of computing statistics.

First let’s compute the mean.
μ=E[X]\mu = E[X] =E[= E[x(mu, M)]]
=E[= E[mu + M z()]=] =mu + M E[E[z()]] (factoring out constants)
==mu + M 00 (using E[E[z()]=0] = 0)
==mu

Unsurprisingly, the mean μ\mu is mu. We could have guessed that by looking at the illustrations above.

Now let’s compute the covariance. But first a clarification about notation. Consider that np.matmul(z(), z().T) calls z twice and therefore generates two independent versions of z. How do we declare that we want the same sample of z both times? We can use an immediately invoked lambda expression (lambda Z: np.matmul(Z Z.T))(z()). (You might have seen this elsewhere called the IIFE pattern.) Okay we’re ready to compute the covariance.

Σ=E[(Xμ)(Xμ)T]\Sigma = E[(X - \mu)(X - \mu)^T]
=E[= E[(lambda X: (X - mu)(X - mu).T)(x(mu, M))]]
=E[= E[(lambda Z: M Z (M Z).T)(z())]]
=E[= E[(lambda Z: M Z Z.T M.T)(z())]]
== M E[E[ (lambda Z: Z Z.T)(z()) ]] M.T
== M II M.T (using E[E[ (lambda Z: Z Z.T)(z()) ]=I] = I — diagonal terms are variances and cross terms are covariances of standard independent normals)
== M M.T

So the covariance Σ\Sigma is M M.T and we get this result.

Result 1. If XX is a gaussian simulated by x(mu, M), it has mean mu and covariance M M.T.

It’s actually more likely you’ll need to go backwards. For example, if you encounter a multivariate gaussian in the wild with the μ\mu, Σ\Sigma parameterization and you need want to visualize it as an ellipsoid using the mu, M parameterization. For that we usually use the eigendecomposition Σ=QΛQT\Sigma = Q \Lambda Q^T

This decomposition is extremely important in linear algebra, and I would encourage you to read up on it later. If we start with any covariance matrix Σ\Sigma, the eigendecomposition has the following properties.

  • Λ\Lambda is a diagonal matrix with positive entries
  • QQ is a high dimensional rotation matrix (all columns are mutually orthogonal)

Defining Λ12\Lambda^{\frac 1 2} as the diagonal matrix whose entries are the square root of Λ\Lambda’s entries, we can see from Result 1 that calling x(mu=μ\mu, M=QΛ12Q \Lambda^{\frac 1 2}) will simulate a multivariate gaussian with mean μ\mu and covariance M M.T =QΛ12(QΛ12)T=QΛQT=Σ= Q \Lambda^{\frac 1 2} (Q \Lambda^{\frac 1 2})^T = Q \Lambda Q^T = \Sigma. So we have the following result.

Result 2. A multivariate gaussian XX with mean μ\mu and covariance Σ\Sigma is simulated by x(mu=μ\mu, M=QΛ12Q \Lambda^{\frac 1 2}) where QΛQTQ \Lambda Q^T is the eigendecomposition of Σ\Sigma.

Let’s break down the action of QΛ12Q \Lambda^{\frac 1 2} into two steps, the action of Λ12\Lambda^{\frac 1 2} and then the action of QQ.

Remember Λ12\Lambda^{\frac 1 2} is a diagonal matrix so it acts in an extremely simple way — by stretching the ithi^{th} axis by the scaling factor given in its ithi^{th} diagonal entry.
Λ12x=[σ1000σ2000σ3][x1x2x3]=[σ1x1σ2x2σ3x3]\Lambda^{\frac 1 2} x = \begin{bmatrix} \sigma_1 && 0 && 0 \\ 0 && \sigma_2 && 0 \\ 0 && 0 && \sigma_3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} \sigma_1 x_1 \\ \sigma_2 x_2 \\ \sigma_3 x_3 \end{bmatrix}

The matrix QQ is a rotation matrix whose action is also easily interpretable from inspecting its columns using column oriented multiplication.

So using our mental Python interpreter as before, we can visualize the samples x(mu=μ\mu, M=QΛ12Q \Lambda^{\frac 1 2}) by imagining a cloud of samples contained in a sphere, then warping the sphere into an ellipsoid whose axes lengths are given by Λ12\Lambda^{\frac 1 2}, and then shifting this whole ellipsoid by μ\mu.

If you are fortunate enough to be working at a robotics company, you can do a code search for your gaussian visualization routines and I bet you will find something like the following.

def vis_gaussian(mean, covar):
    """Visualize a 3d gaussian"""
    eigen_vals, eigen_vects = np.linalg.eig(covar)
    plot_ellipsoid(
        center=mean, 
        major_axes=eigen_vects, 
        axes_lengths=np.sqrt(eigen_vals))

Now I hope whenever you see XN(μ,Σ)X \sim N(\mu, \Sigma) you’ll find it helpful to mentally replace it with X=X =x(mu=μ\mu, M=QΛ12Q \Lambda^{\frac 1 2}) and simulate it with your mental Python interpreter.


© 2020 Biro Inc.