Linear Model

The Formula at the end is very simple but it’s an important part of the Neural network (Gradient Descent) with a backpropagation algorithm.

I am talking about “Linear Regression”. I don’t find this word intuitive enough, that’s why I haven’t used it in the title. Let’s break it down, it’s defining the relationship between two variables. One would be the independent variable (x), and another would be the dependent variable (y).

Dataset

Try out full dataset from here

xi (experience in months)	yi (salary in thousands)
18.290	16.522
17.023	11.666
26.344	23.167
19.106	20.877
27.743	23.166
31.671	32.966
14.186	15.294
29.933	33.159
32.841	32.033
26.874	32.348

So you will have two types of values: observed value (actual value) $y_{i}$ (from labeled data), and predicted value $\overset{y}{^}_{i}$ . Let’s assume there is a linear relationship between the feature variable ( $x_{i}$ ) and the predicted variable $\overset{y}{^}_{i}$ , given as $\overset{y}{^}_{i} = w \cdot x_{i} + b$ .

While you predict the value, you can’t be assured that it would be the same as the actual value $y_{i}$ . There would be some error. Let’s calculate the mean squared error using the formula below for the whole dataset.

err = \frac{1}{n} i = 1 \sum n (y_{i} - \overset{y}{^}_{i})^{2}

\overset{y}{^}_{i} = w \cdot x_{i} + b

err = \frac{1}{n} i = 1 \sum n (y_{i} - (w \cdot x_{i} + b))^{2}

Mean Square Error: A fundamental formula, just the square of the difference between the actual and predicted values, and the average of all those values from the dataset.

One thing you would notice here is that err is a quadratic function.

So $err = f (w, b)$

For both variables, w and b, we must select these parameters in such a way that our error is minimized. This is crucial because minimizing the error enables us to discover a perfect line that accurately predicts the trend when provided with an independent variable. Consequently, our predictions will closely match the actual values.

If you want to get the minimum value in a quadratic function, let’s assume that our quadratic function parabola is growing in a positive Y-direction, which is because if you chose an arbitrary value for $w$ and $b$ , $(y_{i} - \overset{y}{^}_{i})^{2}$ would be positive always.

If you want to find the minimum value for this type of curve you can just get to the point where slop would be $zero$ . Unfortunately, to calculate where the slope is $zero$ we need an equation for this curve and we have to find its derivative value concerning $w$ and compare it with $zero$ to find $w^{'} s$ value. Remember our curve lies in $w$ verses $error$ graph.

There is one more way to find minimum value first just take any arbitrary value of $w$ , and calculate the slop, if it’s positive decrease the $w$ with a certain factor, and if the slop is negative increase the $w$ with a certain factor. we will call this factor a Learning rate ( $L$ ).

w = w - L \cdot \frac{\partial err}{\partial w}

You can gain insight into this formula on this website: gradient-descent-visualiser

🍂 Visrut

Explorer

Linear Model

Dataset

Graph View

Backlinks

🍂 Visrut

Explorer

Linear Model

Dataset §

Graph View

Backlinks

Dataset