The Normal Equation
What is Linear Regression?
Linear regression is a type of model-based machine learning that predicts a target numerical value by calculating a weighted sum of input features, plus a constant called the bias term. Examples of problems that could be addressed with linear regression include:
- 1 Predicting life satisfaction based on GDP per capita.
- 2 Predicting housing prices based on features like square footage and the number of bedrooms.
- 3 Predicting crop yields based on factors like rainfall and fertilizer use.
- 4 Predicting a company's revenue next year based on previous performance metrics.
Linear Regression is represented mathematically by:
where is the predicted value, is the number of features, is the feature value and is the model parameter including the bias term.
Finding Optimal Parameters
The goal of training a linear regression model is to find the optimal values for the parameter vector that minimize the difference between the model's predictions and the actual target values in the training data.
Normal Equation
The Normal Equation is a mathematical formula used to find the optimal values for the parameters of a linear regression model. The optimal
values are the ones that minimize the difference between the model's predictions and the actual target values in the training data.
We'd like to minimize the least-squares cost:
Where is the i-th sample (from a set of m samples) and is the i-th expected result To proceed, well represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here.
The regression coefficients we're looking for are the vector:
Given a training set, define the design matrix to be the -by- matrix 1(actually -by-, if we include the intercept)
that contains the training examples input values in its rows:
Also, let be the -dimensional vector containing all the target values from the training set:
Now, since , we can easily verify that
Thus, using the fact that for a vector , we have that :
Finally, to minimize , let's find its derivatives with respect to Hence,
2(Proof was derived in Andrew Ng Stanford CS229 Notes.)
In the third step, we used the fact that , and in the fifth step used the facts and for symmetric matrix 3(for more details, see Section 4.3 of “Linear Algebra Review and Reference")
To minimize , we set its derivatives to zero, and obtain the normal equations:
Thus, the value of that minimizes is given in closed form by the equation
Key points
- Purpose: The Normal Equation provides a direct, closed-form solution to calculate the best parameters (weights) for a linear regression model.
Normal Equation in sklearn and numpy python.
from sklearn.preprocessing import add_dummy_feature
import numpy as np
def normal_equation(X: np.array, Y: np.array):
X_b = add_dummy_feature(X)
theta_best = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ Y
return theta_best
- Cost Function: It works by finding the parameter values that minimize the Mean Squared Error (MSE) cost function. The MSE measures the average squared difference between the model's predictions and the actual target values.
Advantages
- Fast and efficient for smaller datasets with a limited number of features.
- Direct solution without needing iterative optimization algorithms like gradient descent.
Disadvantages
- Can become computationally very expensive for datasets with a very large number of features.
- Does not handle cases where certain features are redundant or when the number of features exceeds the number of training instances.
Alternative to the Normal Equation: For datasets with many features or when computational efficiency is critical, alternative methods like gradient descent algorithms are used to find the optimal parameters iteratively.