Quadratic Optimization and Matrices

In this post, I want to talk about a way to connect critical points of two variable polynomials of degree 2 to the null space of a matrix. A critical point of a function is a place where all partial derivatives of that function are equal to zero. All polynomials like this are of the form f(x,y)=ax^2+bxy+cy^2+d, where a,b,c,d \in \mathbb{R}. As a reminder, the null space of a matrix A is the set \{x \vert Ax = \vec{0}\}.

Let’s look at a few examples.

Example 1: Let’s find the critical points of the polynomial f(x,y)=3x^2+2xy+4y^2. To do this, we will find all partial derivatives of f(x,y). First, we see that \frac{\partial f}{\partial x}=6x+2y and \frac{\partial f}{\partial y}=2x+8y. So, to find the critical points of f(x,y), we need to solve 6x+2y = 0 and 2x+8y = 0.

This is a system of two equations with two unknowns! Let’s rewrite it using matrices. \begin{pmatrix}6&2\\2&8\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}.

This matrix is invertible, since det\begin{pmatrix}6&2\\2&8\end{pmatrix}=6\cdot 8-2\cdot 2=44\neq 0, so the Invertible Matrix Theorem tell us that the only vector that solves this equation is \begin{pmatrix}0\\0\end{pmatrix}.

This means that the only critical point of f(x,y)=3x^2+2xy+4y^2 occurs at the point (0,0).

Now let’s look at another example with a different polynomial.

Example 2: Let’s find the critical points of the polynomial f(x,y)=4x^2+8xy+4y^2. The partial derivatives of f(x,y) are \frac{\partial f}{\partial x}=8x+8y and \frac{\partial f}{\partial y}=8x+8y. This means we want to solve 8x+8y=0 and 8x+8y=0.

That’s just the same equation twice! So, the corresponding matrix equation to examine is \begin{pmatrix}8&8\\8&8\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}.

This matrix is not invertible since det\begin{pmatrix}8&8\\8&8\end{pmatrix}=8\cdot 8-8\cdot 8=0, which, by the Invertible Matrix Theorem, means that this matrix equation has infinitely many solutions.

That also tells us that the corresponding polynomial has infinitely many critical points! If you graph this polynomial it makes a lot of sense. There’s a flat line along y=-x, which all make f(x,y) attain the same value.

What’s cool about this is that there is a pretty simple way of finding critical points of a two variable quadratic function using linear algebra. And solutions to matrix equations like the ones we made will always either have no solution, one solution, or infinitely many solutions (that’s just a fact about systems of linear equations). Since we’re only every setting the left hand side of the matrix equation equal to a vector of zeros, we will never be in the “no solution” case (i.e. \begin{pmatrix}0\\0\end{pmatrix} will be a valid solution).

This means that polynomials of this form will always either have a single critical point OR infinitely many critical points. (If you had a polynomial with 2 critical points, for example, you would know that it is not a quadratic polynomial.) This is fun to think about because critical points are either local minima, maxima, or saddle points. By just finding the matrix associated with a polynomial like this, you can understand the shape of that polynomial without having to try and visualize it!

Advertisement

Probability and the Pythagorean Theorem

In this post, I want to talk about the equation for the variance of a random variable use this to get a geometric intuition behind some properties of random variables. First, let’s define what a random variable is.

A random variable is a function that takes in a set of possible outcomes and outputs the probability of that particular outcome. For example, we could define a random variable X that takes the value 0 with probability \frac{3}{10} and takes the value 1 with probability \frac{7}{10}.

Once you have a random variable, there are some things you might want to know about it. For example, you might be curious what its expected value is. This is like an average, but it extends to random variables that take on a continuous set of values as well. To find the expected value in this case, we calculate 0\cdot(\frac{3}{10})+1\cdot(\frac{7}{10})=\frac{7}{10}. For a discrete random variable, the expected value is really a weighted average of the possible values of the random variable (weighted by the probability of occurring).

The expected value of random variable X is often written as \mathbb{E}(X). Once you know how to calculate this, you might ask what happens if you transform the values that X can take and then recalculate expected values. In particular, you might ask what happens if you calculate \mathbb{E}(X^2), or in general, \mathbb{E}(X^n) for some n\in\mathbb{N}. For any value of n, these are called “moments of a distribution,” and they tell you about properties of the distribution.

In our example above, to calculate \mathbb{E}(X^2), we just calculate 0^2\cdot(\frac{3}{10})+1^2\cdot(\frac{7}{10})=\frac{7}{10}. In this case, since the possible values of the random variable were 0 and 1, we get the same answer if we square them and recompute the expression. So, E(X)=E(X^2) in this case, but that won’t always be true.

The variance of a random variable, which tells you about how much the values of the random variable are spread out, is defined as Var(X) =\sigma_X^2 =\mathbb{E}(X^2) - \mathbb{E}(X)^2, where \sigma_X is the standard deviation of X.

Now let’s do something kind of fun and rearrange that equation a little bit. If we add \mathbb{E}(X)^2 to both sides, we get that \sigma_X^2+\mathbb{E}(X)^2 = \mathbb{E}(X^2).

Looking at it like this, we can imagine a right triangle with legs of length \sigma_X and \mathbb{E}(X) and hypotenuse of length \sqrt{\mathbb{E}(X^2)}.

If we let f(x) be a moment generating function for X, then we can replace \mathbb{E}(X) with f'(0) and \sqrt{\mathbb{E}(X^2)} with \sqrt{f''(0)}. That’s kind of fun!

I hope this helped you care more about triangles and probability!

Orthogonal Complements and Rank-Nullity

In the last post, we talked about what the orthogonal complement of a set is and we showed why Row(A)\perp Nul(A). This was kind of cool because for me, it’s hard to have a geometric understanding of the row space of matrix based on how matrix multiplication is usually defined (Col(A^\perp) is kind of hard to picture) but it’s not that hard for me to have a geometric understanding of the null space of a matrix. The fact that they are orthogonal means that if you can understand what it means to be in the null space, then to be in the row space just means that you are orthogonal to everything in the null space.

Now, let’s talk about how to connect the fact that the Row(A)\perp Nul(A) to the Rank-Nullity Theorem. As a reminder, the Rank-Nullity Theorem says that for an m\times n matrix A, dim(Col(A))+dim(Nul(A))=n.

The first fact that we need to talk about before starting to understand why this is true is that for an m\times n matrix A, dim(Row(A))=dim(Col(A)). This should seem really surprising at first. The row space of A lives in \mathbb{R}^m and the column space of A lives in \mathbb{R}^n! So, the dimension of the row and column spaces are the same, but they don’t even live in the same ambient dimensions! That’s pretty crazy, if you ask me.

Now, let’s look at an example to see why this makes sense. Consider the matrix A = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{pmatrix}. This represents a transformation from \mathbb{R}^3 to \mathbb{R}^2, and every output is of the form (a_1+2a_2+3a_3)\begin{pmatrix}1\\2\end{pmatrix}, where a_i represents any real number. In this case, the output lives in \mathbb{R}^2, but the output itself is only 1-dimensional (since there’s only one vector that spans the output).

Now, let’s consider the matrix A^T=\begin{pmatrix}1&2\\2&4\\3&6\end{pmatrix}. This represents a transformation from \mathbb{R}^2 to \mathbb{R}^3, and every output is of the form (b_1+2b_2)\begin{pmatrix}1\\2\\3\end{pmatrix}, where b_i represents any real number. In this case, the output lives in \mathbb{R}^3, but the output itself is only 1-dimensional (since there’s only one vector that spans the output).

So, even though the transformations given by A and A^T don’t map into the same dimension, the same number of vectors can still be used to represent their outputs! That’s so cool!

Now, let’s connect this to the Rank-Nullity Theorem. The Rank-Nullity Theorem gives a connection between the dimension of the column space of matrix and the dimension of the null space of a matrix. We just did an example that showed why dimension of the column space of a matrix is the same as the dimension of the row space of a matrix. So, if A is an m\times n matrix, let’s rewrite the Rank-Nullity Theorem as dim(Row(A)) + dim(Nul(A)) = n. This has a nice geometric interpretation since any deficiency in the size of the row space is made up for by the size of the null space. It’s amazing how nicely they work together.

I hope this made you appreciate what it really means to be a matrix just a little bit more!

Orthogonal Complements

In this post, I want to talk about something called an orthogonal complement. The set B is the orthogonal complement of the set A if for all a\in A and for all b\in B, a\cdot b = 0. We write this as A^{\perp} = B since everything in A is orthogonal to everything in B.

Note that the definition of orthogonal complements is symmetric, meaning that if A^{\perp} = B, then we also know that B^{\perp} = A. This is because the dot product is commutative (a\cdot b=b\cdot a).

Let’s look at an example of a set and its orthogonal complement. Consider the xy-plane through the origin in \mathbb{R}^3 This can be written as span\left\{\begin{pmatrix}1\\0\\0\end{pmatrix},\begin{pmatrix}0\\1\\0\end{pmatrix}\right\}, or any vector that has 0 as its z component. So, the orthogonal complement of the xy-plane is any vector of the form \begin{pmatrix}0\\0\\k\end{pmatrix}, where k\in\mathbb{R}.

Now let’s look at an example of orthogonal complements in the context of matrix multiplication. To do this, we’ll relate two features of a matrix, the null space and the row space.

Consider an m\times n matrix A, which represents a transformation from \mathbb{R}^n to \mathbb{R}^m. The null space of A, written Nul(A) is the set of all x such that Ax = \vec{0}. The row space of A is the set of all linear combinations of the rows of A.

To understand why these two sets are orthogonal to each other, let’s write out matrix multiplication in a way that makes it easier to see. We’ll do this with an example of 3\times 3 matrix, but the idea extends to a matrix with any dimensions.

\begin{pmatrix} 1&2&2\\3&-1&-2\\5&-4&-3\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}.

We would normally understand this matrix vector multiplication as a weighted sum of the columns, or x_1\begin{pmatrix}1\\3\\5\end{pmatrix} + x_2\begin{pmatrix}2\\-1\\-4\end{pmatrix} +x_3\begin{pmatrix}2\\-2\\-3\end{pmatrix}, but we are going to look at it a different way.

Then, we can rewrite \begin{pmatrix} 1&2&2\\3&-1&-2\\5&-4&-3\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} as \begin{pmatrix}(1,2,2)\cdot (x_1,x_2,x_3)\\(3,-1,2)\cdot (x_1,x_2,x_3)\\(5,-4,-3)\cdot (x_1,x_2,x_3)\end{pmatrix}.

We can see that these two ways of multiplying matrices are the same because, for example, the first component of the final vector we just output is x_1+2x_2+2x_3. And if you multiply the x‘s through in the first definition of matrix multiplication above, you’ll get the same thing for the first component of the vector.

Now, let’s consider the vector \begin{pmatrix}1\\-2\\1\end{pmatrix}\in Nul\left\{\begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix}\right\}. Then, \begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix}  \begin{pmatrix}1\\-2\\1\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix}. Since \begin{pmatrix}1\\-2\\1\end{pmatrix} is in the null space of \begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix}, it must be orthogonal to each row that matrix.

Now, all we need to show is that if a vector is orthogonal to each row, then it is orthogonal to any linear combination of those rows.

Suppose \vec{x} is orthogonal to every vector in the set \{\vec{a_1}, \cdots,\vec{a_m}\}. Then, let’s compute \vec{x}\cdot (c_1\vec{a_1}+\ldots \cdot c_m\vec{a_m}) (the right side of the dot is any vector in the span of the a_is). We can rewrite this as c_1(\vec{x}\cdot\vec{a_1})+\ldots + c_m(\vec{x}\cdot\vec{a_m}). We know that \vec{x}\cdot a_i=0 for 1\leq i\leq m, so this dot product is zero.

Thus, \vec{x} is orthogonal to any vector in the row space, and not just each vector that spans the row space.

Overall, we showed that if \vec{x}\in Nul(A), then \vec{x}\perp Row(A).

In the next post, we’ll connect this back to the Rank-Nullity theorem by talking about how the row space and column space of a matrix relate to each other.

Linear Regression and Conditional Probability

The goal of linear regression is to find the equation of the polynomial that most closely approximates a set of data points. It turns out that you don’t actually have to restrict yourself to looking at equations where the highest degree on x is 1, but we can talk about that some other time. Linear regression is really useful in settings where you’re trying to figure out how one (several) variable(s) impact(s) a single output variable.

To perform linear regression, though, we need to make some pretty strong assumptions about our data. One of these assumptions can be stated as follows:

\mathbb{P}(Y\vert X)=\mathbb{P}(Y\text{ and }X).

In case you haven’t seen the notation before, \mathbb{P}(Y\vert X) means “the probability of Y given X.” This comes up a lot when you want to know how the probability of an event changes given that you know whether something already happened. An example of this would be “the probability that I get at A in the class given that I got an 87 on the midterm.”

At first, it seems like this would contradict the definition of conditional probability, which says that

\mathbb{P}(Y\vert X)=\frac{\mathbb{P}(Y\text{ and }X)}{\mathbb{P}(X)},

but let’s talk about why it actually makes sense in this context.

When you perform a linear regression, there are kind of two ways to think about what’s going on: prediction and simultaneous measurements. Let’s think about a specific example to make this easier to talk about.

In this example, let’s collect people’s heights and weights and plot them on the x and y axes respectively. One way to think about this is that you are trying to find an equation that uses heights to predict weights. Said a different way, you are trying to maximize \mathbb{P}(Y\vert X). We’ll come back to this idea in the next post.

Another way to think about this is that every time you observe a person, you collect two pieces of information about them: their height AND their weight. Here, INSTEAD of thinking about using height to PREDICT weight, we’re just simultaneously measuring two attributes of a person.

In this example, we see how we can think about either using people’s height to predict their weight, or just think about measuring people’s heights and weights simultaneously and understanding how those two values are related.

In the next post, we’ll say a little more about the assumptions that linear regression makes and why they make sense! Stay tuned!

Rank-Nullity Theorem and Redundancy in a Matrix

I want to talk about some intuition behind the Rank-Nullity Theorem. In particular, let’s talk about why, even though the null space and column space of an m\times n matrix live in different ambient dimensions (\mathbb{R}^n and \mathbb{R}^m respectively) it still makes sense why there is a connection between them. As a reminder, here’s the Rank-Nullity Theorem.

Theorem: If an m\times n matrix has a column space of dimension k, then it has a null space of dimension n-k.

The rank of a matrix tells you how many columns in the matrix are actually contributing to the dimension of the column space. For example, the matrix A=\begin{pmatrix}1&2&3\\2&4&6\end{pmatrix} has a 1-dimensional column space. Since A has two rows, the column space could be at most 2. Since the second two columns are just multiples of the first, though, neither of them provide any new information. In other words, those columns are redundant.

Now let’s examine the redundancy of the columns of A by looking at its null space. First, let’s note that we can write \begin{pmatrix}1&2&3\\2&4&6\end{pmatrix} \begin{pmatrix}1\\1\\-1\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}. This also means that we can write \begin{pmatrix}3\\6\end{pmatrix}=\begin{pmatrix}1\\2\end{pmatrix}+\begin{pmatrix}2\\4\end{pmatrix} since multiplying a matrix by a vector is just taking a weighted sum of the columns of the matrix. Then you can just move one vector to the other side of the equals sign.

This means that finding a nonzero vector in the null space of a matrix is equivalent to writing some of the columns of the matrix as a linear combination of the other columns. In other words, not all of these columns are needed to tell you everything you wanted to know about the matrix.

I hope this helped you see how redundancy in a matrix can be seen through a not fully spanning column space AND a non-trivial null space!

The Fundamental Theorem of Calculus

In this post, I want to talk about a way to think about the Fundamental Theorem of Calculus that will hopefully give more intuition for it. As a reminder, the Fundamental Theorem of Calculus describes the connection between differentiation and integration, which are the two main operations in calculus. This theorem comes in two parts: one that says what happens when you differentiate an integral, and one that says what happens when you integrate a derivative. Let’s state the two parts now.

First Fundamental Theorem of Calculus: \frac{d}{dx}\int_{a}^{x} f(t) dt = f(x)

Second Fundamental Theorem of Calculus: \int_{a}^{b} f'(t) dt = f(b)-f(a).

Let’s start by talking about the First Fundamental Theorem of Calculus. In this, we are differentiating an integral. To understand what this is saying, let’s break it down into parts. The integral is calculating how much of some quantity has accumulated by a certain point in time (this could be any quantity, like distance traveled, water in a container,…). Then, the derivative is asking about how much the total quantity is changing at a particular moment. For example, how much is the amount of water in the container increasing right this second?

The First Fundamental Theorem of Calculus says that the answer to that question is just equal the function f evaluated at the upper bound of the integral, or f(x).

Let’s think about why this makes sense. The integral is keeping track of how much total has accumulated until a certain point. Then, the derivative comes along and says “how much are you changing right this second?” The answer to that question should just be the current value of the function f at that the time t=x, or f(x). You would at least expect the rate at which the area under a function increases to be proportional to the height of the function, and the First Fundamental Theorem of Calculus just says that the proportionality constant is equal to 1.

The Second Fundamental Theorem of Calculus says that if you want to find the total effect that a function’s derivative has over a region [a,b], you just have to compute the difference in the function’s values at those two points. This makes sense because if someone asks what the total effect of the slope of a mountain was in a region, you would just tell them how high you climbed in that region, or the difference between the height of where you ended and where you started.

Alright, now let’s talk about the fun stuff. Both parts of the Fundamental Theorem of Calculus give connections between differentiation and integration. Even though it’s not too hard to understand some intuition behind why each part of the theorem is true, there’s actually something pretty cool going on here! Differentiation is really asking a local question about a function. It’s asking, “How steep is the function at this particular point?” Whereas integration is really asking a global question about a function. It’s asking “In this region, how much have we accumulated by a certain point?”

It’s interesting that these two topics are related because it means that the Fundamental Theorem of Calculus acts as a way to connect the local behavior of a function to the global behavior of that function!

The concept of local/global statements about an object comes up a lot in Differential Geometry, which studies curves and surfaces. A lot of theorems in Differential Geometry talk about which properties of a surface you can measure from the curve/surface (local) (these are called intrinsic properties) and from observing the curve/surface from a distance (global) (these are called extrinsic properties).

So, the Fundamental Theorem of Calculus is kind of a little peek into how local and global properties of functions are related to each other!

Rubik’s Cube and Graphs

Let’s talk about a way to visualize the orientations of the Rubik’s as a graph (the kind with vertices and edges, like the ones shown below). Then, we’ll use this representation to talk about what happens when you repeat a fixed set of moves over and over again on the Rubik’s cube.

The dots labeled with letters are called nodes and the lines connecting the nodes are called edges.

In the case of the Rubik’s cube, each node of the graph represents an orientation of the cube, and an edge connecting two cubes represents the move that takes one orientation to the other. It might be easier to imagine two directed edges (specify at which node it starts and ends) between each node, where the move on one of the edges undoes what the move on the edge in the opposite direction does.

Now, let’s just start picturing what this massive graph looks like. First of all, the graph has over 43 quintillion nodes, since there are roughly that many orientations of the Rubik’s cube. To simplify our graph a little, let’s not allow turns that directly affect any of the center pieces. Let’s also only consider edges that correspond to quarter turns.

At each orientation of the cube, you have 8 different options for how to move to another orientation (since there are four different sections you can rotate and each of them can be rotated two different directions.)

So, imagine building this graph. Out of each node, there are 8 edges. HANG ON… If we did that forever, this graph would have infinitely many nodes! And that would correspond to an infinite number of Rubik’s cube orientations. There ARE a lot of Rubik’s cube orientations, but there are definitely NOT infinitely many Rubik’s cube orientations. So, what does this tell us? It tells us that there must be closed loops within our graph! So, even if you think you’re getting farther and farther away from the correct solution, if you follow some path along the graph, you can only ever get so far away from the node corresponding to a solved cube. Another way of saying that is that given some starting orientation and some ending orientation (our target), there must more than one way of getting from the start to the end!

What else does the loopiness of this graph tell us? Let’s consider a finite set of moves that we repeat over and over again. For example, in Rubik’s cube notation, let’s consider the combination of moves RU' (each letter corresponds to a different twist of the cube, and the prime symbol means turn it the way that’s not implied). Whereas before, each edge of the graph corresponded to a single twist of the cube, now each edge represents multiple twists. But, here’s the crazy thing…this graph will still have loops in it no matter what finite set of moves you repeat! It has to, since the graph has finitely many edges!

This is true because the only option is for it to either visit every node of the graph (orientation of the cube) or eventually come back to the starting orientation before hitting all of them. Since the set of moves you’re repeating is fixed, you’ll never jump around the graph and not get back to where you started.

This is so cool! Given any fixed set of twists of the cube, you know that you can repeat them and eventually get back to where you started!

Polynomials and Synthetic Divison

In this post, I want to talk about synthetic polynomial division and why the final value you get is BOTH the y value of the function at a particular point AND the remainder when you perform the division. We’ll use one running example with coefficients a_i, and even though our example will be with a cubic polynomial, the same idea generalizes to polynomials of any degree.

Let’s let f(x) = a_3x^3+a_2x^2+a_1x+a_0, and lets let g(x)=x-c. We want to compute \frac{f(x)}{g(x)}. To do this, we’ll set up synthetic divison as follows.

Inside the box, we put the value that makes the denominator zero. One of the assumptions of synthetic division is that the highest degree on the x term is 1. So, if c is inside the box, we know that we are dividing by x-c.

To the right of the box, we put the coefficients of the polynomial in question (in decreasing order of exponent). If there were a term missing as the powers decreased (like if the polynomial were x^2+1 (no x term)), we would put a zero in that position.

Now, let’s do some synthetic division. This is a combination of two steps: add and multiply. In the first step, we’ll just be bringing down the first constant, but that’s the same as adding zero to the first constant. Here’s the result if you carry out the synthetic division in this example.

The last term in the last column above the horizontal line that’s hard to read is c^3a_3.

The expression in last column of synthetic division below the blue line always gives the remainder of the polynomial division. However, we can ALSO see that this expression (a_0+ca_1+c^2a_2+c^3a_3) is the same as f(c), or the function f evaluated at x=c.

Hang on, that’s pretty cool. The remainder of the polynomial division by (x-c) is the same as y- value of the function at x=c.

Now let’s talk about why the final term being zero when plugging in c corresponds to (x-c) being a factor of f(x). If f(c)=0, then we can write f(x) as a product of two polynomials, f(x)=(x-c)\cdot h(x), where h(x) is some other polynomial (Note that if you evaluate f(c), you’ll get (c-c)\cdot h(c)=0\cdot h(c)=0).

Even if the term you’re dividing by is not a factor of f(x), we see that synthetic division still gives you the $y-$value at that point. Thinking in terms of remainders, this also makes a lot of sense. The remainder after division can be thought of as a measure of how far the denominator was from being a factor of the numerator. If, at x=a, the y-value of the function was 4, then the numerator was 4 more than a multiple of the denominator. In other words, if you subtract 4 from the numerator, it will be divisible by the denominator with no remainder. And, this will have created a polynomial which IS a multiple of the denominator.

Let’s look at an example to make this a little more concrete. Let’s let f(x)=x^3-1, and let’s let g(x)=x-2. Then, the synthetic division will look like this.

The final value in the bottom right is $7$, so this should be the $y-$value of the function at x=2. We can check that f(2)=2^3-1=7. Also, if the remainder is 7, then we should be able to subtract 7 and obtain a multiple of (x-2). Let’s see if that works.

x^3-1-7=x^3-8=(x-2)(x^2+2x+4). Indeed, (x-2) is a factor of that new polynomial!

I hope this helped you see a connection between zeros and factors of polynomials! And how the output of synthetic division relates to the numerator function’s value at a particular point!

Remember to post this on Facebook and tell your friends about it! Also, you can subscribe to posts so you can be the first to read the next one!

Rational Root Theorem

This post is a follow-up to the last post about the Fundamental Theorem of Algebra. As a recap, the Fundamental Theorem of Algebra says that if you have a polynomial of degree n with coefficients in \mathbb{C}, then the polynomial will have n roots in \mathbb{C}.

As we talked about last time, the Fundamental Theorem of Algebra is an existence proof because for a polynomial of degree n whose coefficients are in \mathbb{C}, it says that you’ll be able to find n values which, when plugged into the polynomial, make the polynomial equal zero. In other words, you’re guaranteeing the existence of the values that will make the polynomial equal zero.

Existence proofs are cool and all, but the Fundamental Theorem of Algebra doesn’t tell us anything about how to find the x-values that make the polynomial equal zero. Wouldn’t it be nice if there were a way to know where these zeros were?

It turns out that if we restrict ourselves to looking for rational zeros of a polynomial, then we can say exactly what those kinds of zeros would have to look like. Now, let’s state the Rational Root Theorem.

Rational Root Theorem: If a polynomial f(x) = d_0x^n+d_1x^{n-1}+\ldots+d_{n-1}x+d_n has any rational zeros, then they will have to occur at x=\frac{p}{q}, where \pm p is some factor of d_n and \pm q is some factor of d_0.

To prove the Rational Root Theorem, we’re going to write a polynomial as a product of terms, which the Fundamental Theorem of Algebra guarantees we can do.

Proof: Let f(x) = (a_1+b_1x)(a_2+b_2x)\ldots(a_n+b_nx), where the “\ldots” includes the product of other linear terms. First, let’s find the zeros of this function. To do that, we want to solve (a_1+b_1x)(a_2+b_2x)\ldots(a_n+b_nx) = 0. Note that we won’t always be able to get n factors multiplied together if there are some factors that don’t yield rational roots. But even if you can’t, that gives you information, too! For example, there might be a polynomial with five roots, only three of which are rational.

Before we solve for the values of x that make this function equal zero, let’s just think about what the highest degree term and lowest degree term will look like. We don’t need to think about the terms that have degree 0<k<n. The highest degree term comes from multiplying all of the terms that have an x in them. And the lowest degree term comes from multiplying all of the terms that are just constants. This means that the highest degree term will be (b_1)(b_2)\ldots (b_n)x^n. And the lowest degree term will be (a_1)(a_2)\ldots (a_n).

Let’s remember that we’re trying to solve (a_1+b_1x)(a_2+b_2x)\ldots(a_n+b_nx) = 0. Since the left-hand side is a product of n terms that equals zero, the whole left-hand side will equal zero when any of these terms individually equals zero. So, this function has zeros when x\in\{-\frac{a_1}{b_1},-\frac{a_2}{b_2},\ldots,-\frac{a_n}{b_n}\}

Okay, we’re so close! We know that if our function has a rational zero, it will definitely be in the set A=\{-\frac{a_1}{b_1},-\frac{a_2}{b_2},\ldots,-\frac{a_n}{b_n}\}, even though not all of these quotients have to be rational. And if we want to find these values, we just have to search the set of quotients \{\frac{p_i}{q_i}\}, where p_i is a factor of a_1a_2\ldots a_n and q_i is a factor of b_1b_2\ldots b_n because that will check all of the values in A, which is the set of all zeros.

This theorem says that IF a function has rational zeros, then here’s how to find them. But there’s no guarantee that a function will have rational zeros to begin with.

So, to recap: The Fundamental Theorem of Algebra states the existence of n complex-valued zeros for a polynomial of degree n, but is kind of like “good luck!” when it comes to actually finding these zeros. Then the Rational Root Theorem comes along and says, “hey, if you don’t mind only looking for rational zeros of this function, I can help you out!” It says that any rational root of your polynomial can be written as the quotient of two integers, where the numerator is a factor of the constant term and the denominator is a factor of the highest degree term.

I hope this helped you see a connection between the Fundamental Theorem of Algebra and the Rational Root Theorem!

Remember to pass this page along to your friends and ask them to share it!