I want to connect the normal equations, which come up in statistics, to weak solutions, which come up in PDEs. Let’s start with the normal equations. These arise when you want to solve a linear system , where
is
for
. When you have more equations than unknowns, the system is called “overdetermined.” If your system has no exact solution (which is frequently the case for overdetermined systems), then the best you can do is minimize
. This is done by formulating the normal equations
and solving for
. This approach has a lot of issues, but I just want to use it to show a connection to another area of math.
We can rearrange the normal equations and rewrite them as . Since
was the system we wanted to solve originally,
is the residual vector, or how far
is from solving the system exactly. Since
, this means that the residual
is orthogonal to every column of
. If we write this in the language of bilinear forms, we get that
This bilinear form is symmetric, so we can also write this as
Now, let’s talk about what a weak solution is. These usually come up when differential equations have solutions that might not be as differentiable as the equation itself requires. The idea behind weak solutions is to remove a bit of the strict requirement that a function be smooth enough, as long as it satisfies the other aspects of the PDE. If you wanted to solve a PDE of the form for some differential operator
, the weak formulation instead seeks to solve
, where
is a bilinear form and
is a sufficiently smooth test function.
Using properties of bilinear forms, we can rewrite this equation as .
If we use the example of the linear system we started with, this says that for all vectors
. This is sort of a weak formulation of the linear system
.
This looks identical to the result in the normal equations that , except that the normal equations impose that particular vectors, the columns of
must be orthogonal to residual
, rather than just a general vector
This means that, in some ways, you can think of the least squares formulation of an overdetermined linear system as a weak formulation. In PDEs, weak formulations where the test functions are the same as the elements you’re using to approximate your solution (for least squares those are the columns of ) are called Galerkin finite element methods.
So that means there’s a nice relationship between Galerkin finite element methods and the least square formulation of an overdetermined linear system!

















