Probability and the Pythagorean Theorem

In this post, I want to talk about the equation for the variance of a random variable use this to get a geometric intuition behind some properties of random variables. First, let’s define what a random variable is.

A random variable is a function that takes in a set of possible outcomes and outputs the probability of that particular outcome. For example, we could define a random variable $X$ that takes the value $0$ with probability $\frac{3}{10}$ and takes the value $1$ with probability $\frac{7}{10}$ .

Once you have a random variable, there are some things you might want to know about it. For example, you might be curious what its expected value is. This is like an average, but it extends to random variables that take on a continuous set of values as well. To find the expected value in this case, we calculate $0\cdot(\frac{3}{10})+1\cdot(\frac{7}{10})=\frac{7}{10}$ . For a discrete random variable, the expected value is really a weighted average of the possible values of the random variable (weighted by the probability of occurring).

The expected value of random variable $X$ is often written as $\mathbb{E}(X)$ . Once you know how to calculate this, you might ask what happens if you transform the values that $X$ can take and then recalculate expected values. In particular, you might ask what happens if you calculate $\mathbb{E}(X^2)$ , or in general, $\mathbb{E}(X^n)$ for some $n\in\mathbb{N}$ . For any value of $n$ , these are called “moments of a distribution,” and they tell you about properties of the distribution.

In our example above, to calculate $\mathbb{E}(X^2)$ , we just calculate $0^2\cdot(\frac{3}{10})+1^2\cdot(\frac{7}{10})=\frac{7}{10}$ . In this case, since the possible values of the random variable were $0$ and $1$ , we get the same answer if we square them and recompute the expression. So, $E(X)=E(X^2)$ in this case, but that won’t always be true.

The variance of a random variable, which tells you about how much the values of the random variable are spread out, is defined as $Var(X) =\sigma_X^2 =\mathbb{E}(X^2) - \mathbb{E}(X)^2$ , where $\sigma_X$ is the standard deviation of $X$ .

Now let’s do something kind of fun and rearrange that equation a little bit. If we add $\mathbb{E}(X)^2$ to both sides, we get that $\sigma_X^2+\mathbb{E}(X)^2 = \mathbb{E}(X^2)$ .

Looking at it like this, we can imagine a right triangle with legs of length $\sigma_X$ and $\mathbb{E}(X)$ and hypotenuse of length $\sqrt{\mathbb{E}(X^2)}$ .

If we let $f(x)$ be a moment generating function for $X$ , then we can replace $\mathbb{E}(X)$ with $f'(0)$ and $\sqrt{\mathbb{E}(X^2)}$ with $\sqrt{f''(0)}$ . That’s kind of fun!