Intro
Let’s use the random vector x to represent an uncertain state, and the random vector z to represent an uncertain measurement. Even before making any actual measurements, we should have prior idea of the likelihoods of different values of the combined vector [xz]. These are subjective assessments of the following sort.
- The value of x is probably close to the known vector a
- The value of z will probably turn out to be close to the known vector b
- The value of z is probably close to Fx, where F is a known matrix
To encode these prior subjective beliefs numerically, we can say that [xz] is distributed as a Gaussian random variable.
[xz]∼N([μxμz],[ΣxxΣzxΣxzΣzz])
- Σxx describes how close we believe x is to μx.
- Σzz describes how close we believe z will be to μz.
- Σxz=ΣzxT describes how correlated we think z and x are.
The Kalman Filter can be viewed as a principled way to choose μxμzΣxxΣxzΣzxΣzz. There are also other ways to choose these priors, but suppose for now that we have chosen them sensibly.
Now that we have a prior p(x,z), we can incorporate any measurement z0 into the state by simply taking the posterior estimate p(x∣z=z0). We will find in the next section that the posterior is x∣z=z0 distributed as a Gaussian with the following parameters.
μx∣z=z0=μx+ΣxxΣxz−1(z0−μz)
Σx∣z=z0=Σxx−ΣxzΣzz−1Σzx
I will call these the equations the Bayes Inference equations.
Deriving the Bayes Inference Equations
In this section I’ll derive the Bayes inference equations.
μx∣z=z0=μx+ΣxxΣxz−1(z0−μz)
Σx∣z=z0=Σxx−ΣxzΣzz−1Σzx
Feel free to come back to this section later if you’re willing to take these equations on faith for now.
We have the proportionality relationship p(x∣z=z0)=p(z0)p(x,z0)∝p(x,z0). This means only have to evaluate the right hand side p(x,z0) in order to know the distribution p(x∣z=z0).
Remember the Gaussian density, where K represents a constant of integration that we don’t care about.
p(x,z)=Kexp(−21[x−μxz−μz]T[ΣxxΣzxΣxzΣzz]−1[x−μxz−μz])
It will be convenient to use the inverse covariance matrix, also known as the information matrix.
[ΛxxΛzxΛxzΛzz]≡[ΣxxΣzxΣxzΣzz]−1
We can substitute the information matrix and expand.
p(x,z)=Kexp(−21[xz]T[ΛxxΛzxΛxzΛzz][xz]+[xz]T[ΛxxΛzxΛxzΛzz][μxμz])
Then substitute z=z0 and expand more. We can collect any terms that are not multiplied by x into a constant C.
p(x,z0)=Kexp(−21xTΛxxx−xTΛxyz0+xTΛxxμx+xTΛxzμz+C)
The C and K both drop out as scaling constants.
p(x,z0)∝exp(−21xTΛxxx−xTΛxyz0+xTΛxxμx+xTΛxzμz)
p(x,z0)∝exp(−21xTΛxxx+xT(Λxxμx−Λxz(z0−μz))
Complete the square by first by rewriting Λxxμx−Λzz(z0−μz)→Λxx(μx−Λxx−1Λxz(z0−μz))
p(x,z0)∝exp(−21xTΛxxx+xTΛxx(μx−Λxx−1Λxz(z0−μz)))
p(x,z0)∝exp(−21(x−(μx−Λxx−1Λzz(z0−μz)))TΛxx(x−(μx−Λxx−1Λxz(z0−μz))))
Note that this is the probability density of a Gaussian with mean μx−Λxx−1Λxz(z0−μz) and covariance Λxx−1.
μx∣z=z0=μx−Λxx−1Λxz(z0−μz)
Σx∣z=z0=Λxx−1
This formula is written in terms of the information matrix, but in many cases it is more convenient to write it in terms of the covariance matrix. To accomplish this, we can use the block-matrix inversion formula where Σ/Σzz is the Schur complement Σxx−ΣxzΣzz−1Σzx.
[ΛxxΛzxΛxzΛzz]=[ΣxxΣzxΣxzΣzz]−1=[(Σ/Σzz)−1Σzz−1Σzx(Σ/Σzz)−1−(Σ/Σzz)−1ΣxzΣzz−1Σzz−1+Σzz−1Σzx(Σ/Σzz)−1ΣxzΣzz−1]
We see that Λxx−1=Σ/Σzz and −Λxx−1Λxz=ΣxxΣxz−1. Therefore we can write the distribution of x∣z=z0 in terms of the covariance matrix.
μx∣z=z0=μx+ΣxzΣzz−1(z0−μz)
Σx∣z=z0=Σxx−ΣxzΣzz−1Σzx
Deriving the Kalman Filter
In the first section, I mentioned that the Kalman Filter can be seen as a principled way to establish the priors μx,μz,Σxx,Σzz,Σxz.
Remember we wanted these priors so that, given an actual measurement z0, we could apply the Bayes inference equations.
μx∣z=z0=μx+ΣxzΣzz−1(z0−μz)
Σx∣z=z0=Σxx−ΣxzΣzz−1Σzx
The Kalman Filter sets up μx,μz,Σxx,Σzz,Σxz by supposing that the state variable x and the measurement variable z are both caused by a single prior variable x0∼N(μx0,Σx0), via a state-update matrix F, and a measurement matrix H.
With w∼N(0,Σw) as independent process noise, we assume our state x arises from x0 follows.
x=Fx0+w
With v∼N(0,Σv) as independent measurement error, we assume our measurement z arises from x (and ultimately from x0) as follows.
z=Hx+v
These two equations are enough to generate the list μx,μz,Σxx,Σzz,Σxz via straightforward computations. See the next section for those derivations in detail.
We will end up with the following.
μx=Fμx0
μz=HFμx0
Σxx=FΣx0FT+Σw
Σxz=ΣxxHT
Σzz=HΣxxHT+Σv
That’s it! Now plug those values into the Bayes update rule and you have a Kalman Filter!
μx∣z=z0=Fμx0+ΣxzΣzz−1(z0−μz)
Σx∣z=z0=Σxx−ΣxzΣzz−1Σzx
A note on terminology for comparison to the Wikipedia article on Kalman Filter:
μx is called the predicted mean
Σxx is called the predicted covariance
Σzz is called the innovation, or pre-fit residual covariance
ΣxzΣzz−1=ΣxxHTΣzz−1 is called the optimal Kalman Gain
Deriving the Kalman Filter In Detail
In this section, I’ll show theses equalities.
μx=Fμx0
μz=HFμx0
Σxx=FΣx0FT+Σw
Σxz=ΣxxHT
Σzz=HΣxxHT+Σv
Here are the means.
μx=E[x]=E[Fx0+w]=FE[x0]+E[w]=Fμx0+0=Fμx0
μz=E[z]=E[Hx+v]=HE[x]+E[v]=Hμx+0=HFμx0
Here are the covariances and cross covariance. It will be convenient to define the delta operator Δ which means Δy=y−E[y]. Also, for zero-mean variables like v,Δv=v.
Σxx=E[ΔxΔxT]
=E[(FΔx0+w)(FΔx0+w)T]
=E[(FΔx0Δx0TFT)]+E[FΔxwT]+E[wΔxTFT]+E[wwT]
Use independence to distribute expectation in the second and third terms.
=FE[Δx0Δx0T]FT+E[FΔx]E[wT]+E[w]E[ΔxTFT]+E[wwT]
=FΣx0FT+0+0+Σw
=FΣx0FT+Σw
Σxz=E[ΔxΔzT]
=E[ΔxΔ(Hx+v)T]
=E[Δx(HΔx+v)T]
=E[ΔxΔxT]HT+E[Δx]E[vT]
=ΣxxHT+0
=ΣxxHT
Σzz=E[ΔzΔzT]
=E[Δ(Hx+v)Δ(Hx+v)T]
=E[(HΔx+v)(HΔx+v)T]
=HE[ΔxΔxT]HT+E[v]E[(HΔx+v)T]+E[HΔx+v]E[vT]+E[vvT]
=HΣxxHT+0+0+Σv
=HΣxxHT+Σv