Connection between the Normal Equations and Weak Solutions

I want to connect the normal equations, which come up in statistics, to weak solutions, which come up in PDEs. Let’s start with the normal equations. These arise when you want to solve a linear system A\vec{x}=\vec{b}, where A is m\times n for m>n. When you have more equations than unknowns, the system is called “overdetermined.” If your system has no exact solution (which is frequently the case for overdetermined systems), then the best you can do is minimize \|A\vec{x}-b\|^2. This is done by formulating the normal equations A^TA\vec{x}=A^T\vec{b} and solving for \vec{x}. This approach has a lot of issues, but I just want to use it to show a connection to another area of math.

We can rearrange the normal equations and rewrite them as A^T(A\vec{x}-\vec{b})=0. Since A\vec{x}=\vec{b} was the system we wanted to solve originally, A\vec{x}-\vec{b} is the residual vector, or how far \vec{x} is from solving the system exactly. Since A^T(A\vec{x}-\vec{b})=0, this means that the residual A\vec{x}-\vec{b} is orthogonal to every column of A. If we write this in the language of bilinear forms, we get that (A[:,i],A\vec{x}-\vec{b}) = 0. This bilinear form is symmetric, so we can also write this as (A\vec{x}-\vec{b},A[:,i]) = 0.

Now, let’s talk about what a weak solution is. These usually come up when differential equations have solutions that might not be as differentiable as the equation itself requires. The idea behind weak solutions is to remove a bit of the strict requirement that a function be smooth enough, as long as it satisfies the other aspects of the PDE. If you wanted to solve a PDE of the form \mathcal{L}u=f for some differential operator \mathcal{L}, the weak formulation instead seeks to solve (\mathcal{L}u,v) = (f,v), where (\cdot,\cdot) is a bilinear form and v is a sufficiently smooth test function.

Using properties of bilinear forms, we can rewrite this equation as (\mathcal{L}u-f,v)=0.

If we use the example of the linear system we started with, this says that (A\vec{x}-\vec{b},\vec{v})=0 for all vectors \vec{v}. This is sort of a weak formulation of the linear system A\vec{x}=\vec{b}.

This looks identical to the result in the normal equations that (A\vec{x}-\vec{b},A[:,i]) = 0., except that the normal equations impose that particular vectors, the columns of A must be orthogonal to residual A\vec{x}-b, rather than just a general vector \vec{v}.

This means that, in some ways, you can think of the least squares formulation of an overdetermined linear system as a weak formulation. In PDEs, weak formulations where the test functions are the same as the elements you’re using to approximate your solution (for least squares those are the columns of A) are called Galerkin finite element methods.

So that means there’s a nice relationship between Galerkin finite element methods and the least square formulation of an overdetermined linear system!

Leave a comment