Let’s use the random vector to represent an uncertain state, and the random vector to represent an uncertain measurement. Even before making any actual measurements, we should have prior idea of the likelihoods of different values of the combined vector . These are subjective assessments of the following sort.
- The value of is probably close to the known vector
- The value of will probably turn out to be close to the known vector
- The value of is probably close to , where is a known matrix
To encode these prior subjective beliefs numerically, we can say that is distributed as a Gaussian random variable.
- describes how close we believe is to .
- describes how close we believe will be to .
- describes how correlated we think and are.
The Kalman Filter can be viewed as a principled way to choose . There are also other ways to choose these priors, but suppose for now that we have chosen them sensibly.
Now that we have a prior , we can incorporate any measurement into the state by simply taking the posterior estimate . We will find in the next section that the posterior is distributed as a Gaussian with the following parameters.
I will call these the equations the Bayes Inference equations.
Deriving the Bayes Inference Equations
In this section I’ll derive the Bayes inference equations.
Feel free to come back to this section later if you’re willing to take these equations on faith for now.
We have the proportionality relationship . This means only have to evaluate the right hand side in order to know the distribution .
Remember the Gaussian density, where represents a constant of integration that we don’t care about.
It will be convenient to use the inverse covariance matrix, also known as the information matrix.
We can substitute the information matrix and expand.
Then substitute and expand more. We can collect any terms that are not multiplied by into a constant .
The and both drop out as scaling constants.
Complete the square by first by rewriting
Note that this is the probability density of a Gaussian with mean and covariance .
This formula is written in terms of the information matrix, but in many cases it is more convenient to write it in terms of the covariance matrix. To accomplish this, we can use the block-matrix inversion formula where is the Schur complement .
We see that and . Therefore we can write the distribution of in terms of the covariance matrix.
Deriving the Kalman Filter
In the first section, I mentioned that the Kalman Filter can be seen as a principled way to establish the priors .
Remember we wanted these priors so that, given an actual measurement , we could apply the Bayes inference equations.
The Kalman Filter sets up by supposing that the state variable and the measurement variable are both caused by a single prior variable , via a state-update matrix , and a measurement matrix .
With as independent process noise, we assume our state arises from follows.
With as independent measurement error, we assume our measurement arises from (and ultimately from ) as follows.
These two equations are enough to generate the list via straightforward computations. See the next section for those derivations in detail.
We will end up with the following.
That’s it! Now plug those values into the Bayes update rule and you have a Kalman Filter!
A note on terminology for comparison to the Wikipedia article on Kalman Filter:
is called the predicted mean
is called the predicted covariance
is called the innovation, or pre-fit residual covariance
is called the optimal Kalman Gain
Deriving the Kalman Filter In Detail
In this section, I’ll show theses equalities.
Here are the means.
Here are the covariances and cross covariance. It will be convenient to define the delta operator which means . Also, for zero-mean variables like .
Use independence to distribute expectation in the second and third terms.