# Linear Regression and Conditional Probability

The goal of linear regression is to find the equation of the polynomial that most closely approximates a set of data points. It turns out that you don’t actually have to restrict yourself to looking at equations where the highest degree on $x$ is 1, but we can talk about that some other time. Linear regression is really useful in settings where you’re trying to figure out how one (several) variable(s) impact(s) a single output variable.

To perform linear regression, though, we need to make some pretty strong assumptions about our data. One of these assumptions can be stated as follows: $\mathbb{P}(Y\vert X)=\mathbb{P}(Y\text{ and }X).$

In case you haven’t seen the notation before, $\mathbb{P}(Y\vert X)$ means “the probability of Y given X.” This comes up a lot when you want to know how the probability of an event changes given that you know whether something already happened. An example of this would be “the probability that I get at A in the class given that I got an 87 on the midterm.”

At first, it seems like this would contradict the definition of conditional probability, which says that $\mathbb{P}(Y\vert X)=\frac{\mathbb{P}(Y\text{ and }X)}{\mathbb{P}(X)}$,

but let’s talk about why it actually makes sense in this context.

When you perform a linear regression, there are kind of two ways to think about what’s going on: prediction and simultaneous measurements. Let’s think about a specific example to make this easier to talk about.

In this example, let’s collect people’s heights and weights and plot them on the $x$ and $y$ axes respectively. One way to think about this is that you are trying to find an equation that uses heights to predict weights. Said a different way, you are trying to maximize $\mathbb{P}(Y\vert X)$. We’ll come back to this idea in the next post.

Another way to think about this is that every time you observe a person, you collect two pieces of information about them: their height AND their weight. Here, INSTEAD of thinking about using height to PREDICT weight, we’re just simultaneously measuring two attributes of a person.

In this example, we see how we can think about either using people’s height to predict their weight, or just think about measuring people’s heights and weights simultaneously and understanding how those two values are related.

In the next post, we’ll say a little more about the assumptions that linear regression makes and why they make sense! Stay tuned!