The Radon-Nikodym Theorem
May 27, 2019
Let μ and ν be two measures on the same measurable space (Ω,Σ). We say that μ is absolutely continuous with respect to ν (written μ≪ν) if μ(E)=0 whenever ν(E)=0 for E∈Σ. Given ν, there is a simple way to generate a new measure which is absolutely continuous with respect to ν: take any measurable function g:Ω→R and define
μg(E)=∫EgdνIt is a standard exercise in measure theory that μg is a measure and that μg(E)=0 whenever ν(E)=0.
The Radon-Nikodym theorem asserts that under mild hypotheses every measure μ which is absolutely continuous with respect to ν has this form for a unique measurable function g, called the Radon-Nikodym derivative of ν with respect to μ, often written dμdν. The theorem is an important part of the structure theory of measures, and the Radon-Nikodym derivative comes up frequently in applications. Indeed, I was reminded of this theorem because of its role in another blog post that I was writing about KL-divergence.
The plan for this post is to give a proof of the Radon-Nikodym theorem and then discuss some applications and examples. The proof, which I learned in graduate school, uses Hilbert spaces to do most of the work; I think it is originally due to von Neumann.
Warm-up
The proof of the theorem in its full generality inevitably uses some reasonably serious measure theory and/or functional analysis. But I think it is instructive to first work it out in a simple special case: measures on finite sets.
So let Ω=\brx1,…xn be a finite set and let Σ be the power set of Ω. A measure μ on Ω is determined by its values on each one point set \brxi, so it is nothing more than a vector in Rn whose entries are all nonnegative. Let us use the notation μi for the components of μ, meaning μi=μ(\brxi).
Given two measures μ and ν, we have that μ is absolutely continuous with respect to ν if and only if μi=0 whenever νi=0. We would like to find a measurable function g:Ω→R - which is also just a vector in Rn - with the property that:
μ(E)=∫Egdνfor every E∈Σ. Applying this to the one point sets Ei=\brxi, we get:
μi=μ(Ei)=∫Eigdν=giνiThis yields a simple formula for the Radon-Nikodym derivative:
dμdν(xi)={μiνiνi≠00otherwiseIn fact this formula works for any σ-algebra on a finite set - not just the power set - so the theorem on finite sets is quite simple and explicit. Moreover if we look at this calculation in the right way it gives us a hint for how to handle the general case. The equation μi=giνi implies that for any function f on Ω we have
∫Ωfdμ=∫Ωfgdν=⟨f,g⟩νwhere ⟨⋅,⋅⟩ν is the L2-inner product determined by ν. So another way to construct g is to apply the Riesz representation theorem to the linear functional f↦∫Ωfdμ on the finite dimensional Hilbert space L2(Ω,ν). This idea doesn’t quite work as stated in the general case because there is no guarantee that the linear functional is bounded, but it can be fixed by working instead over the Hilbert space L2(Ω,μ+ν). The details appear in the next section.
Proof of the Radon-Nikodym theorem
Without further ado, let us jump into the proof.
Let μ and ν be finite measures on a measurable space (Ω,Σ). If μ is absolutely continuous with respect to ν then there is a unique g∈L1(Ω,ν) with the property that:
μ(E)=∫Egdνfor every E∈Σ.
Consider the Hilbert space H=L2(μ+ν). Define a linear functional ϕ:H→R by
ϕ(f)=∫ΩfdμUsing the Schwarz inequality and the fact that μ≤μ+ν, we get:
|ϕ(f)|2≤(∫Ω1⋅|f|dμ)2≤μ(Ω)‖f‖2L2(μ)≤μ(Ω)‖f‖2HThis shows that ϕ is a bounded linear functional, so by the Riesz representation theorem there is a function h∈H with the property that:
∫Ωfdμ=ϕ(f)=⟨f,h⟩=∫Ωfhd(μ+ν)It follows that:
∫Ωfd(μ+ν)=∫Ωfdν+∫Ωfhd(μ+ν)Rearranging, we get:
∫Ωfdν=∫Ωf(1−h)d(μ+ν)Now, by (1) we have for any measurable set E:
ν(E)=∫Ω1Edν=∫Ω1Ehd(μ+ν)=∫Ehd(μ+ν)Consider the case E=h−1(−∞,0]. Then ∫Ehd(μ+ν)≤0 since h≤0 on E, so we must have ν(E)=0. A similar argument using the h−1[1,∞) and (2) implies that 0<h<1 almost everywhere with respect to ν. Consequently the function g=h1−h is ν-measurable; I claim that g satisfies the conditions in the statement of the theorem.
Indeed, for any measurable set E we have:
∫Egdν=∫Ωg1Edν=∫Ωg1E(1−h)d(μ+ν)by (2)=∫Ωh1Ed(μ+ν)=∫Ω1Edμby (1)=μ(E)(This also proves that g really is ν-integrable: g is the increasing limit of the ν-integrable functions g1En where En=h−1(0,1−1n), and the computation above shows that ∫Ωg1Endν=μ(En)→μ(Ω). So g is integrable by the monotone convergence theorem.)
This completes the proof that the desired function g exists, so it remains only to prove that g is unique. Suppose g′ is another ν-integrable function with the property that μ(E)=∫Eg′dν for every measurable set E. Let E denote the subset of Ω where g>g′; we have:
0=μ(E)−μ(E)=∫E(g−g′)dνThis forces ν(E)=0. Repeating this argument on the set where g′>g implies that g=g′ almost everywhere with respect to ν.
Concluding remarks
The Radon-Nikodym theorem admits some modest generalizations. It can be proved for σ-finite measures by applying the theorem to an increasing union of sets on which both μ and ν are finite. It also extends to signed measures by decomposing them into their positive and negative parts, and it extends to complex measures by considering their real and imaginary parts.
But for the moment I am mostly interested in applications to probability theory, for which the formulation of the theorem presented here is sufficient. In follow-up posts I plan to explore two such applications in particular: the conditional expectation of a random variable and the KL-divergence.