“Mathematics knows no races or geographical boundaries; for mathematics, the cultural world is one country.”
The Radon-Nikodym Theorem - May 27, 2019
The Radon-Nikodym theorem is a workhorse result in measure theory, with numerous applications to probability dynamics (such as the existence of conditional expectations and the existence of KL-divergence). I will give a simple proof using Hilbert spaces.
Gibbs' Inequality - May 27, 2019
It is a fairly standard fact that relative entropy (KL-divergence) is positive definite, but I was unsatisfied with the proofs of this fact that I saw when I glanced through the literature. In this post I will provide a complete proof which works on a general probability space.
Working with badly nested data in Spark - May 11, 2019
Apache Spark is a distributed computing platform which can handle almost any kind of data you throw at it. But it has lots of optimizations that require SQL-like tables with consistent schemas. This post includes some of what I have learned about taking advantage of these optimizations when the data has an inconsistent, deeply nested schema.
Decimals are hard - April 14, 2018
Many people find fractions confusing and difficult, and there is a tendency to dismiss them in favor of decimals in daily life. But answering even basic questions about decimals requires confronting serious philosophical questions and some unsolved problems in mathematics.
Annotator Agreement in Machine Learning Experiments - February 12, 2018
It is standard practice in applied machine learning to use human annotated data to build training and evaluation datasets. But these datasets can be compromised when the annotation task is vague or the annotators disagree on the labels to be applied to an observation. The standard way to assess the quality of the results of an annotation experiment is to measure the rate of agreement between annotators, but this calculation is not as straightforward as one might naively expect.