While writing an obituary for George Box, I stumbled on something I thought was ingenious: a method for generating independent pairs of numbers drawn from the normal distribution.
I’ll concede: that’s not necessarily something that makes the average reader-in-the-street stop in their tracks and say “Wow!” In honesty, it would probably make the average reader-in-the-street rapidly become a reader-on-the-other-side-of-the-street. However, I thought an article on it might provide some insight into two mathematical minds: that of George Box, one of the greatest1 statisticians of the 20th century, and that of me, possibly the greatest mathematical hack of the 21st.
How the Box-Muller transform works
If you want to apply the Box-Muller transform, you need two numbers drawn from a uniform distribution - so they’re equally likely to take on any value between 0 and 1. Let’s call these numbers $U$ and $V$. Box and Muller claim that if you work out
$$X = \sqrt{-2 \ln (U)} \cos (2\pi V)$$ and $$Y = \sqrt{-2 \ln (U)} \sin (2\pi V)$$
then $X$ and $Y$ are independent (information about one tells you nothing about the other) and normally distributed with a mean of 0 and a standard deviation of 1. I’m not going to prove that, because I don’t know how, but I can explain what’s happening.
There’s a hint in my choices of letter: you might recognise that you could simplify these down to $X = R \cos(\theta)$ and $Y = R\sin(\theta)$, which are just the sides of a triangle. The $R = \sqrt{-2\ln(U)}$ is the distance from $(0,0)$ - because $U$ is between 0 and 1, $\ln(U)$ is anywhere from $-\infty$ to 02 Multiplying by -2 turns it into a nice positive number (so you can take its square root really) and tends to reduce the distance from the origin. For normally-distributed variables, you want the distances to clump up in the middle; that’s what the 2 is for.
The $\theta = 2\pi V$ is much simpler: it just says ‘move in a random direction’.
What Colin did next
My immediate thought was, ‘I wonder if I can use that to work out the probability tables for $z$-scores you get in formula books!’ What do you mean, that wasn’t your immediate thought?3 Long story short: the answer is no; I just wanted to show you my thought process and that not everything in maths works out as neatly as you’d like.
My insight was that the probability of generating an $X$ value smaller than some constant $k$ would be the same as the probability of generating $U$ and $V$ values that gave smaller $X$s. So far so obvious! In that case, it’s just a case of rearranging the formulas to get expressions for (say) $V$ in terms of $U$ and integrating to find the appropriate area.
So I tried that:
$$\sqrt{-2 \ln (U)} \cos(2\pi V) = k \\ \cos(2\pi V) = \sqrt{ \frac{k^2}{-2\ln(U)}} \\ V = \frac{1}{2\pi}\cos^{-1}\left( \sqrt{ \frac{k^2}{-2\ln(U)}} \right)$$
Yikes. I don’t fancy trying to integrate that - the arccos is bad enough, but the $\ln(U)$ on the bottom? Forget about it.
Let’s try the other way:
$$\sqrt{-2 \ln (U)} \cos(2\pi V) = k \\ -2\ln(U) = k^2 \sec^2(2\pi V) \\ U = e^{-\frac{k^2}{2}\sec^2(2\pi V)}$$
Curses! I don’t think that’s going to work, either. $e^{\sec^2 x}$ isn’t an integral I know how to do - so I’m stymied.
Back to the drawing board, I’m afraid - this time, I didn’t get the cookie of a new maths discovery; the difference between a poor mathematician and a decent mathematician is that a poor mathematician says “I got it wrong, I’m rubbish;” the decent mathematician says either “ah well. Next puzzle!” or “ah well! Try again.”
The great mathematicians, of course, see right to the end of the puzzle before they start.