How does L1, and L2 regularization prevent overfitting?

2 min readFeb 22, 2023

L1 regularization and L2 regularization are commonly used regularizations in the world of machine learning and deep learning when the model shows signs of overfitting.

Easy way to remember the formula for interviews?

→ L1 is called Lasso regression | L2 is called Ridge Regression.
→ L1 has just “w” added to the cost function while L2 has a square of “w” (weight matrix) added to the cost function.

What is overfitting?

If the model fails to generalize on the newer examples as the model is generalized while training, we call this scenario overfitting.

In simple words.. if there is a big difference in the validation loss and train loss, we can say that the model is overfitting.

In general, how do we avoid overfitting?

There exist different techniques like dropout, adaptive regularization, early stopping, training with more data, trying different architectures, and using l1 or l2 regularization.

Here we will discuss how L1 and L2 regularization works to prevent overfitting!

How L1 and L2 regularization prevents overfitting:

L1 and L2 regularization are techniques used in machine learning to prevent overfitting. Overfitting is a phenomenon where the model learns the noise in the data instead of the underlying patterns, which leads to poor performance on new, unseen data.

L1 regularization, also known as Lasso regularization, adds a penalty term to the cost function of the model that is proportional to the absolute value of the weights. This penalty term encourages the model to select a smaller subset of features that are important for making predictions. In other words, L1 regularization can be used for feature selection. By reducing the number of features, the model becomes less complex and less likely to overfit.

L2 regularization, also known as Ridge regularization, adds a penalty term to the cost function of the model that is proportional to the square of the weights. This penalty term encourages the model to distribute the weights evenly across all the features. By doing so, L2 regularization can prevent the model from becoming too dependent on any one feature and reduce overfitting.

In summary, L1 and L2 regularization can prevent overfitting by reducing the complexity of the model and distributing the weights more evenly across all the features.

How does L1, and L2 regularization prevent overfitting?

Easy way to remember the formula for interviews?

What is overfitting?

In general, how do we avoid overfitting?

How L1 and L2 regularization prevents overfitting:

Written by Induraj

No responses yet