Sourdough Baking – Day 1

The very second that you decide that you want to eat freshly baked sourdough made by your own hands, you’re about 36 hours away from that being a reality (if you’re following the recipe from this Bon Appetit video — https://www.youtube.com/watch?v=oidnwPIeqsI&t=181s). Don’t let that deter you, though! If you’re in NYC, I can give you some starter!

The first day of the sourdough making process isn’t too involved. You first take 2 tablespoons of starter and put it into a bowl, and then add 250 g of purified water to that. Give this a mix to break up the starter. It’s not a huge deal if you don’t put exactly 250 g of water in. Next, add the same amount of flour as you added water (as long as there is the same amount of flour and water by mass, it should be fine).

Then mix this all together until a shaggy dough forms, and cover it with a towel and let it sit overnight.

This is what mine looks like right now, except it has a towel covering it. The point of this step is to give the yeast and bacteria in the starter more food (like you do every time you feed it so that they are strong enough to produce enough CO $_2$ and lactic acid throughout the rest of the bread making process). We just gave those organisms a very delicious meal.

That’s it for day 1! Tomorrow is dough mixing day. Stay tuned!

Sourdough Starter Starter Guide

Today I want to write about something that I’ve never written about — baking! In particular, sourdough bread. Sourdough bread is made without commercial yeast found in a packet. While it does require a bit more work, the end product (a nice loaf o’ bread) really makes the process worth it. I started really caring about sourdough bread on January 6, 2019, when Steven, my sourdough starter, was born. Here’s a picture of Steven.

A sourdough starter is essentially just a mixture of flour and water, but it’s been inoculated with the yeast and bacteria that naturally live in the air around us.

I followed this recipe for my sourdough starter, which actually uses pineapple juice as the liquid for the first three days. This helps jumpstart the activity in the starter.

https://savorthebest.com/wild-yeast-sourdough-starter/

But how does this work? The bacteria in the starter break down the carbohydrates in the flour into simple sugars, and these sugars feed the wild yeast. The bacteria that live in the starter are called Lactobacilli, and produce lactic acid throughout this process. This is called lactic acid fermentation. Meanwhile, the yeast that live in the starter are undergoing alcoholic fermentation, converting carbohydrates from the flour into ethanol. This acidic and alcoholic environment makes the starter inhospitable to other bacteria that you don’t want in your starter.

This process isn’t immediate, though. The first few weeks that I did this, there were still some unwanted bacteria in my starter. I could tell because it had a really awful smell. But after keeping with this for awhile, the bad bacteria eventually died, leaving a stable culture of good bacteria and yeast.

To maintain a sourdough starter, you do something called “feeding” it. This consists of removing some of the starter and giving it fresh water and fresh flour as food. At first this seems wasteful, but there are a lot of things you can do with the discarded starter! If you look up “sourdough discard recipes” you’ll find a bunch of ideas. For example, brownies, banana bread, … the list goes on. I made these brownies with sourdough starter today!

When I feed my sourdough starter, I keep $\frac{1}{2}$ cup of my sourdough starter, and mix that with 113 g fresh flour and 113 g of fresh water. So, even though you discard some starter every time you feed it, you’re always building on the work that you’ve done before. I’ve seen different ratios that can be used to feed a sourdough starter, so I don’t really know how important the exact ratio is. You’ll notice that after feeding your sourdough starter, the yeast really start to produce a lot of $CO_2$ and the starter gains a lot in volume. So fun!

To maintain a sourdough starter you have several options. If you leave it on the counter, you need to feed it once or twice a day. This is a good option if you bake regularly. If you leave it in the fridge, you only need to feed it once a week, and you should let it come to room temperature before adding more flour and water to it. If you leave it in the freezer, you can kind of forget about it until you want to bake again. You can follow these instructions for how to get it ready for baking again.

https://www.thespruceeats.com/freeze-sourdough-starter-428055

I hope that if you didn’t care about baking, this made you think about starting to care, and even if you already cared about baking, then I hope you enjoyed reading!

Look out for more posts about baking in the future!

Quadratic Optimization and Matrices

In this post, I want to talk about a way to connect critical points of two variable polynomials of degree $2$ to the null space of a matrix. A critical point of a function is a place where all partial derivatives of that function are equal to zero. All polynomials like this are of the form $f(x,y)=ax^2+bxy+cy^2+d$ , where $a,b,c,d \in \mathbb{R}$ . As a reminder, the null space of a matrix $A$ is the set $\{x \vert Ax = \vec{0}\}$ .

Let’s look at a few examples.

Example 1: Let’s find the critical points of the polynomial $f(x,y)=3x^2+2xy+4y^2$ . To do this, we will find all partial derivatives of $f(x,y)$ . First, we see that $\frac{\partial f}{\partial x}=6x+2y$ and $\frac{\partial f}{\partial y}=2x+8y$ . So, to find the critical points of $f(x,y)$ , we need to solve $6x+2y = 0$ and $2x+8y = 0$ .

This is a system of two equations with two unknowns! Let’s rewrite it using matrices. $\begin{pmatrix}6&2\\2&8\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}$ .

This matrix is invertible, since $det\begin{pmatrix}6&2\\2&8\end{pmatrix}=6\cdot 8-2\cdot 2=44\neq 0$ , so the Invertible Matrix Theorem tell us that the only vector that solves this equation is $\begin{pmatrix}0\\0\end{pmatrix}$ .

This means that the only critical point of $f(x,y)=3x^2+2xy+4y^2$ occurs at the point $(0,0)$ .

Now let’s look at another example with a different polynomial.

Example 2: Let’s find the critical points of the polynomial $f(x,y)=4x^2+8xy+4y^2$ . The partial derivatives of $f(x,y)$ are $\frac{\partial f}{\partial x}=8x+8y$ and $\frac{\partial f}{\partial y}=8x+8y$ . This means we want to solve $8x+8y=0$ and $8x+8y=0$ .

That’s just the same equation twice! So, the corresponding matrix equation to examine is $\begin{pmatrix}8&8\\8&8\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}$ .

This matrix is not invertible since $det\begin{pmatrix}8&8\\8&8\end{pmatrix}=8\cdot 8-8\cdot 8=0$ , which, by the Invertible Matrix Theorem, means that this matrix equation has infinitely many solutions.

That also tells us that the corresponding polynomial has infinitely many critical points! If you graph this polynomial it makes a lot of sense. There’s a flat line along $y=-x$ , which all make $f(x,y)$ attain the same value.

What’s cool about this is that there is a pretty simple way of finding critical points of a two variable quadratic function using linear algebra. And solutions to matrix equations like the ones we made will always either have no solution, one solution, or infinitely many solutions (that’s just a fact about systems of linear equations). Since we’re only every setting the left hand side of the matrix equation equal to a vector of zeros, we will never be in the “no solution” case (i.e. $\begin{pmatrix}0\\0\end{pmatrix}$ will be a valid solution).

This means that polynomials of this form will always either have a single critical point OR infinitely many critical points. (If you had a polynomial with $2$ critical points, for example, you would know that it is not a quadratic polynomial.) This is fun to think about because critical points are either local minima, maxima, or saddle points. By just finding the matrix associated with a polynomial like this, you can understand the shape of that polynomial without having to try and visualize it!

Probability and the Pythagorean Theorem

In this post, I want to talk about the equation for the variance of a random variable use this to get a geometric intuition behind some properties of random variables. First, let’s define what a random variable is.

A random variable is a function that takes in a set of possible outcomes and outputs the probability of that particular outcome. For example, we could define a random variable $X$ that takes the value $0$ with probability $\frac{3}{10}$ and takes the value $1$ with probability $\frac{7}{10}$ .

Once you have a random variable, there are some things you might want to know about it. For example, you might be curious what its expected value is. This is like an average, but it extends to random variables that take on a continuous set of values as well. To find the expected value in this case, we calculate $0\cdot(\frac{3}{10})+1\cdot(\frac{7}{10})=\frac{7}{10}$ . For a discrete random variable, the expected value is really a weighted average of the possible values of the random variable (weighted by the probability of occurring).

The expected value of random variable $X$ is often written as $\mathbb{E}(X)$ . Once you know how to calculate this, you might ask what happens if you transform the values that $X$ can take and then recalculate expected values. In particular, you might ask what happens if you calculate $\mathbb{E}(X^2)$ , or in general, $\mathbb{E}(X^n)$ for some $n\in\mathbb{N}$ . For any value of $n$ , these are called “moments of a distribution,” and they tell you about properties of the distribution.

In our example above, to calculate $\mathbb{E}(X^2)$ , we just calculate $0^2\cdot(\frac{3}{10})+1^2\cdot(\frac{7}{10})=\frac{7}{10}$ . In this case, since the possible values of the random variable were $0$ and $1$ , we get the same answer if we square them and recompute the expression. So, $E(X)=E(X^2)$ in this case, but that won’t always be true.

The variance of a random variable, which tells you about how much the values of the random variable are spread out, is defined as $Var(X) =\sigma_X^2 =\mathbb{E}(X^2) - \mathbb{E}(X)^2$ , where $\sigma_X$ is the standard deviation of $X$ .

Now let’s do something kind of fun and rearrange that equation a little bit. If we add $\mathbb{E}(X)^2$ to both sides, we get that $\sigma_X^2+\mathbb{E}(X)^2 = \mathbb{E}(X^2)$ .

Looking at it like this, we can imagine a right triangle with legs of length $\sigma_X$ and $\mathbb{E}(X)$ and hypotenuse of length $\sqrt{\mathbb{E}(X^2)}$ .

If we let $f(x)$ be a moment generating function for $X$ , then we can replace $\mathbb{E}(X)$ with $f'(0)$ and $\sqrt{\mathbb{E}(X^2)}$ with $\sqrt{f''(0)}$ . That’s kind of fun!

I hope this helped you care more about triangles and probability!

Orthogonal Complements and Rank-Nullity

In the last post, we talked about what the orthogonal complement of a set is and we showed why $Row(A)\perp Nul(A)$ . This was kind of cool because for me, it’s hard to have a geometric understanding of the row space of matrix based on how matrix multiplication is usually defined ( $Col(A^\perp)$ is kind of hard to picture) but it’s not that hard for me to have a geometric understanding of the null space of a matrix. The fact that they are orthogonal means that if you can understand what it means to be in the null space, then to be in the row space just means that you are orthogonal to everything in the null space.

Now, let’s talk about how to connect the fact that the $Row(A)\perp Nul(A)$ to the Rank-Nullity Theorem. As a reminder, the Rank-Nullity Theorem says that for an $m\times n$ matrix $A$ , $dim(Col(A))+dim(Nul(A))=n$ .

The first fact that we need to talk about before starting to understand why this is true is that for an $m\times n$ matrix $A$ , $dim(Row(A))=dim(Col(A))$ . This should seem really surprising at first. The row space of $A$ lives in $\mathbb{R}^m$ and the column space of $A$ lives in $\mathbb{R}^n$ ! So, the dimension of the row and column spaces are the same, but they don’t even live in the same ambient dimensions! That’s pretty crazy, if you ask me.

Now, let’s look at an example to see why this makes sense. Consider the matrix $A = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{pmatrix}$ . This represents a transformation from $\mathbb{R}^3$ to $\mathbb{R}^2$ , and every output is of the form $(a_1+2a_2+3a_3)\begin{pmatrix}1\\2\end{pmatrix}$ , where $a_i$ represents any real number. In this case, the output lives in $\mathbb{R}^2$ , but the output itself is only $1-$ dimensional (since there’s only one vector that spans the output).

Now, let’s consider the matrix $A^T=\begin{pmatrix}1&2\\2&4\\3&6\end{pmatrix}$ . This represents a transformation from $\mathbb{R}^2$ to $\mathbb{R}^3$ , and every output is of the form $(b_1+2b_2)\begin{pmatrix}1\\2\\3\end{pmatrix}$ , where $b_i$ represents any real number. In this case, the output lives in $\mathbb{R}^3$ , but the output itself is only $1-$ dimensional (since there’s only one vector that spans the output).

So, even though the transformations given by $A$ and $A^T$ don’t map into the same dimension, the same number of vectors can still be used to represent their outputs! That’s so cool!

Now, let’s connect this to the Rank-Nullity Theorem. The Rank-Nullity Theorem gives a connection between the dimension of the column space of matrix and the dimension of the null space of a matrix. We just did an example that showed why dimension of the column space of a matrix is the same as the dimension of the row space of a matrix. So, if $A$ is an $m\times n$ matrix, let’s rewrite the Rank-Nullity Theorem as $dim(Row(A)) + dim(Nul(A)) = n$ . This has a nice geometric interpretation since any deficiency in the size of the row space is made up for by the size of the null space. It’s amazing how nicely they work together.

I hope this made you appreciate what it really means to be a matrix just a little bit more!

Orthogonal Complements

In this post, I want to talk about something called an orthogonal complement. The set $B$ is the orthogonal complement of the set $A$ if for all $a\in A$ and for all $b\in B$ , $a\cdot b = 0$ . We write this as $A^{\perp} = B$ since everything in $A$ is orthogonal to everything in $B$ .

Note that the definition of orthogonal complements is symmetric, meaning that if $A^{\perp} = B$ , then we also know that $B^{\perp} = A$ . This is because the dot product is commutative ( $a\cdot b=b\cdot a$ ).

Let’s look at an example of a set and its orthogonal complement. Consider the $xy-$ plane through the origin in $\mathbb{R}^3$ This can be written as span $\left\{\begin{pmatrix}1\\0\\0\end{pmatrix},\begin{pmatrix}0\\1\\0\end{pmatrix}\right\}$ , or any vector that has $0$ as its $z$ component. So, the orthogonal complement of the $xy-$ plane is any vector of the form $\begin{pmatrix}0\\0\\k\end{pmatrix}$ , where $k\in\mathbb{R}$ .

Now let’s look at an example of orthogonal complements in the context of matrix multiplication. To do this, we’ll relate two features of a matrix, the null space and the row space.

Consider an $m\times n$ matrix $A$ , which represents a transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ . The null space of $A$ , written $Nul(A)$ is the set of all $x$ such that $Ax = \vec{0}$ . The row space of $A$ is the set of all linear combinations of the rows of $A$ .

To understand why these two sets are orthogonal to each other, let’s write out matrix multiplication in a way that makes it easier to see. We’ll do this with an example of $3\times 3$ matrix, but the idea extends to a matrix with any dimensions.

$\begin{pmatrix} 1&2&2\\3&-1&-2\\5&-4&-3\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$ .

We would normally understand this matrix vector multiplication as a weighted sum of the columns, or $x_1\begin{pmatrix}1\\3\\5\end{pmatrix} + x_2\begin{pmatrix}2\\-1\\-4\end{pmatrix} +x_3\begin{pmatrix}2\\-2\\-3\end{pmatrix}$ , but we are going to look at it a different way.

Then, we can rewrite $\begin{pmatrix} 1&2&2\\3&-1&-2\\5&-4&-3\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}$ as $\begin{pmatrix}(1,2,2)\cdot (x_1,x_2,x_3)\\(3,-1,2)\cdot (x_1,x_2,x_3)\\(5,-4,-3)\cdot (x_1,x_2,x_3)\end{pmatrix}$ .

We can see that these two ways of multiplying matrices are the same because, for example, the first component of the final vector we just output is $x_1+2x_2+2x_3$ . And if you multiply the $x$ ‘s through in the first definition of matrix multiplication above, you’ll get the same thing for the first component of the vector.

Now, let’s consider the vector $\begin{pmatrix}1\\-2\\1\end{pmatrix}\in Nul\left\{\begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix}\right\}$ . Then, $\begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix} \begin{pmatrix}1\\-2\\1\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix}$ . Since $\begin{pmatrix}1\\-2\\1\end{pmatrix}$ is in the null space of $\begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix}$ , it must be orthogonal to each row that matrix.

Now, all we need to show is that if a vector is orthogonal to each row, then it is orthogonal to any linear combination of those rows.

Suppose $\vec{x}$ is orthogonal to every vector in the set $\{\vec{a_1}, \cdots,\vec{a_m}\}$ . Then, let’s compute $\vec{x}\cdot (c_1\vec{a_1}+\ldots \cdot c_m\vec{a_m})$ (the right side of the dot is any vector in the span of the $a_i$ s). We can rewrite this as $c_1(\vec{x}\cdot\vec{a_1})+\ldots + c_m(\vec{x}\cdot\vec{a_m})$ . We know that $\vec{x}\cdot a_i=0$ for $1\leq i\leq m$ , so this dot product is zero.

Thus, $\vec{x}$ is orthogonal to any vector in the row space, and not just each vector that spans the row space.

Overall, we showed that if $\vec{x}\in Nul(A)$ , then $\vec{x}\perp Row(A)$ .

In the next post, we’ll connect this back to the Rank-Nullity theorem by talking about how the row space and column space of a matrix relate to each other.

Linear Regression and Conditional Probability

The goal of linear regression is to find the equation of the polynomial that most closely approximates a set of data points. It turns out that you don’t actually have to restrict yourself to looking at equations where the highest degree on $x$ is 1, but we can talk about that some other time. Linear regression is really useful in settings where you’re trying to figure out how one (several) variable(s) impact(s) a single output variable.

To perform linear regression, though, we need to make some pretty strong assumptions about our data. One of these assumptions can be stated as follows:

$\mathbb{P}(Y\vert X)=\mathbb{P}(Y\text{ and }X).$

In case you haven’t seen the notation before, $\mathbb{P}(Y\vert X)$ means “the probability of Y given X.” This comes up a lot when you want to know how the probability of an event changes given that you know whether something already happened. An example of this would be “the probability that I get at A in the class given that I got an 87 on the midterm.”

At first, it seems like this would contradict the definition of conditional probability, which says that

$\mathbb{P}(Y\vert X)=\frac{\mathbb{P}(Y\text{ and }X)}{\mathbb{P}(X)}$ ,

but let’s talk about why it actually makes sense in this context.

When you perform a linear regression, there are kind of two ways to think about what’s going on: prediction and simultaneous measurements. Let’s think about a specific example to make this easier to talk about.

In this example, let’s collect people’s heights and weights and plot them on the $x$ and $y$ axes respectively. One way to think about this is that you are trying to find an equation that uses heights to predict weights. Said a different way, you are trying to maximize $\mathbb{P}(Y\vert X)$ . We’ll come back to this idea in the next post.

Another way to think about this is that every time you observe a person, you collect two pieces of information about them: their height AND their weight. Here, INSTEAD of thinking about using height to PREDICT weight, we’re just simultaneously measuring two attributes of a person.

In this example, we see how we can think about either using people’s height to predict their weight, or just think about measuring people’s heights and weights simultaneously and understanding how those two values are related.

In the next post, we’ll say a little more about the assumptions that linear regression makes and why they make sense! Stay tuned!

Rank-Nullity Theorem and Redundancy in a Matrix

I want to talk about some intuition behind the Rank-Nullity Theorem. In particular, let’s talk about why, even though the null space and column space of an $m\times n$ matrix live in different ambient dimensions ( $\mathbb{R}^n$ and $\mathbb{R}^m$ respectively) it still makes sense why there is a connection between them. As a reminder, here’s the Rank-Nullity Theorem.

Theorem: If an $m\times n$ matrix has a column space of dimension $k$ , then it has a null space of dimension $n-k$ .

The rank of a matrix tells you how many columns in the matrix are actually contributing to the dimension of the column space. For example, the matrix $A=\begin{pmatrix}1&2&3\\2&4&6\end{pmatrix}$ has a $1-$ dimensional column space. Since $A$ has two rows, the column space could be at most $2$ . Since the second two columns are just multiples of the first, though, neither of them provide any new information. In other words, those columns are redundant.

Now let’s examine the redundancy of the columns of $A$ by looking at its null space. First, let’s note that we can write $\begin{pmatrix}1&2&3\\2&4&6\end{pmatrix} \begin{pmatrix}1\\1\\-1\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}$ . This also means that we can write $\begin{pmatrix}3\\6\end{pmatrix}=\begin{pmatrix}1\\2\end{pmatrix}+\begin{pmatrix}2\\4\end{pmatrix}$ since multiplying a matrix by a vector is just taking a weighted sum of the columns of the matrix. Then you can just move one vector to the other side of the equals sign.

This means that finding a nonzero vector in the null space of a matrix is equivalent to writing some of the columns of the matrix as a linear combination of the other columns. In other words, not all of these columns are needed to tell you everything you wanted to know about the matrix.

I hope this helped you see how redundancy in a matrix can be seen through a not fully spanning column space AND a non-trivial null space!

The Fundamental Theorem of Calculus

In this post, I want to talk about a way to think about the Fundamental Theorem of Calculus that will hopefully give more intuition for it. As a reminder, the Fundamental Theorem of Calculus describes the connection between differentiation and integration, which are the two main operations in calculus. This theorem comes in two parts: one that says what happens when you differentiate an integral, and one that says what happens when you integrate a derivative. Let’s state the two parts now.

First Fundamental Theorem of Calculus: $\frac{d}{dx}\int_{a}^{x} f(t) dt = f(x)$

Second Fundamental Theorem of Calculus: $\int_{a}^{b} f'(t) dt = f(b)-f(a)$ .

Let’s start by talking about the First Fundamental Theorem of Calculus. In this, we are differentiating an integral. To understand what this is saying, let’s break it down into parts. The integral is calculating how much of some quantity has accumulated by a certain point in time (this could be any quantity, like distance traveled, water in a container,…). Then, the derivative is asking about how much the total quantity is changing at a particular moment. For example, how much is the amount of water in the container increasing right this second?

The First Fundamental Theorem of Calculus says that the answer to that question is just equal the function $f$ evaluated at the upper bound of the integral, or $f(x)$ .

Let’s think about why this makes sense. The integral is keeping track of how much total has accumulated until a certain point. Then, the derivative comes along and says “how much are you changing right this second?” The answer to that question should just be the current value of the function $f$ at that the time $t=x$ , or $f(x)$ . You would at least expect the rate at which the area under a function increases to be proportional to the height of the function, and the First Fundamental Theorem of Calculus just says that the proportionality constant is equal to $1$ .

The Second Fundamental Theorem of Calculus says that if you want to find the total effect that a function’s derivative has over a region $[a,b]$ , you just have to compute the difference in the function’s values at those two points. This makes sense because if someone asks what the total effect of the slope of a mountain was in a region, you would just tell them how high you climbed in that region, or the difference between the height of where you ended and where you started.

Alright, now let’s talk about the fun stuff. Both parts of the Fundamental Theorem of Calculus give connections between differentiation and integration. Even though it’s not too hard to understand some intuition behind why each part of the theorem is true, there’s actually something pretty cool going on here! Differentiation is really asking a local question about a function. It’s asking, “How steep is the function at this particular point?” Whereas integration is really asking a global question about a function. It’s asking “In this region, how much have we accumulated by a certain point?”

It’s interesting that these two topics are related because it means that the Fundamental Theorem of Calculus acts as a way to connect the local behavior of a function to the global behavior of that function!

The concept of local/global statements about an object comes up a lot in Differential Geometry, which studies curves and surfaces. A lot of theorems in Differential Geometry talk about which properties of a surface you can measure from the curve/surface (local) (these are called intrinsic properties) and from observing the curve/surface from a distance (global) (these are called extrinsic properties).

So, the Fundamental Theorem of Calculus is kind of a little peek into how local and global properties of functions are related to each other!

Rubik’s Cube and Graphs

Let’s talk about a way to visualize the orientations of the Rubik’s as a graph (the kind with vertices and edges, like the ones shown below). Then, we’ll use this representation to talk about what happens when you repeat a fixed set of moves over and over again on the Rubik’s cube.

The dots labeled with letters are called nodes and the lines connecting the nodes are called edges.

In the case of the Rubik’s cube, each node of the graph represents an orientation of the cube, and an edge connecting two cubes represents the move that takes one orientation to the other. It might be easier to imagine two directed edges (specify at which node it starts and ends) between each node, where the move on one of the edges undoes what the move on the edge in the opposite direction does.

Now, let’s just start picturing what this massive graph looks like. First of all, the graph has over 43 quintillion nodes, since there are roughly that many orientations of the Rubik’s cube. To simplify our graph a little, let’s not allow turns that directly affect any of the center pieces. Let’s also only consider edges that correspond to quarter turns.

At each orientation of the cube, you have 8 different options for how to move to another orientation (since there are four different sections you can rotate and each of them can be rotated two different directions.)

So, imagine building this graph. Out of each node, there are 8 edges. HANG ON… If we did that forever, this graph would have infinitely many nodes! And that would correspond to an infinite number of Rubik’s cube orientations. There ARE a lot of Rubik’s cube orientations, but there are definitely NOT infinitely many Rubik’s cube orientations. So, what does this tell us? It tells us that there must be closed loops within our graph! So, even if you think you’re getting farther and farther away from the correct solution, if you follow some path along the graph, you can only ever get so far away from the node corresponding to a solved cube. Another way of saying that is that given some starting orientation and some ending orientation (our target), there must more than one way of getting from the start to the end!

What else does the loopiness of this graph tell us? Let’s consider a finite set of moves that we repeat over and over again. For example, in Rubik’s cube notation, let’s consider the combination of moves $RU'$ (each letter corresponds to a different twist of the cube, and the prime symbol means turn it the way that’s not implied). Whereas before, each edge of the graph corresponded to a single twist of the cube, now each edge represents multiple twists. But, here’s the crazy thing…this graph will still have loops in it no matter what finite set of moves you repeat! It has to, since the graph has finitely many edges!

This is true because the only option is for it to either visit every node of the graph (orientation of the cube) or eventually come back to the starting orientation before hitting all of them. Since the set of moves you’re repeating is fixed, you’ll never jump around the graph and not get back to where you started.

This is so cool! Given any fixed set of twists of the cube, you know that you can repeat them and eventually get back to where you started!