How to derive B0 and B1 in Linear Regression- Part2

Induraj
5 min readAug 9, 2020

Other related articles:

Which Metrics in Regression matter the most? MSE|RMSE|MAE|R2|Adj R2- Advantages/Disadvantages (click here)

What are Bo and B1? these model parameters are sometimes referred to as teta0 and teta1. Basically, B0 represents the intercept and later represents the slope of the regression line.

We all know that the regression line is given by Y=B0+B1.X

To understand as to how Y is expressed as a function of X with these model parameters and to understand how the best fit line is selected, In this post step by step derivation of the formula for B0 and B1 is derived.

Consider some problems as shown below, the best regression line is selected with B0= 19.969 and B1=0.00776, so how did the algorithm find this value for B0, B1 to find this line which is best compared to any line that can be derived for this problem in hand.

example of best fit line fitted between predictor and target variable

So what is the formula for B0 and B1 that gives this best fit line? Does this formula alone give the best fit line?

Before we dive into the mathematics to derive the formula for B0 and B1, let's first discuss the assumptions made in the linear regression.

The assumption made in Linear regression

a. Variables follow a linear trend i.e, the curve of regression is of linear form

b. The Gaussian error term is independent and centered following gausian distribution with mean of zero and variance of sigma²

c. Because the gaussian error is independent following gaussian distribution (as stated above), the response variable (Y) also follows gaussian probability distribution with parameters. i.e since gaussian error is independent random variable, the response variable is also independent random variable

Diving into deriving formulas:

In linear regression, we try to find the best fit line [Y=B0+B1.X]. The parameters B0 and B1 are chosen in such a way that the line represents the trend with the least error. So the main objective here is to find the value of the minimum value of B0 and B1 so that the error introduced while the regression line is fitted is less.

So we can represent this objective mathematically as,

In the above equation, S represents a convex function, so by iterating through different values of B0 and B1, we try to find the best combination of B0 and B1 that will lead to a lesser error while the regression line is fitted. Note: We try to minimize error, hence we try to find the minimum value of B0 and B1.

So the above equation can be represented as in the below equation as the function that minimizes the squared distance between the actual y and the predicted.

In differential calculus, we say that the function attains minimum value if the slope at the given point is zero. so here we try to find the point where the convex function has the minimum value, hence in the below equation we equate it to zero.

Differentiating eq.1 with respect to B0, we get

similarly differentiating eq.2 with respect to B1, we get

Writing eq.3 and eq.4 in terms of Yi, we get

Since we are interested in finding B0 and B1, we derive these from eq.5 and eq.6 as below,

The formula of B0

As we have the formula for B0, we are left with finding only B1, which we derive from eq.6 as shown below,

As you note, we are just playing with multiplying and diving “n” wherever required,

We have thus derived a formula for B1, To note is that B1 is nothing but

Thus these are the formals for B0 and B1 based on which the stochastic descent algorithm further acts on the convex function S as we discussed earlier. So the gradient descent algorithm tries to find the global optima by iteration and finds the corresponding values of B0 and B1 which corresponds to the point where global minima are found so that the regression line will have the least possible error.

--

--