*THIS IS A TWO-PART SERIES. PART 1 (this one) WILL BE ALL ABOUT THE THEORETICAL UNDERSTANDING OF THE LOGISTIC REGRESSION ALGORITHM. PART 2 WILL BE COVERING THE CODING PART, WHERE WE WILL BE IMPLEMENTING LOGISTIC REGRESSION IN PYTHON*
I have discussed a lot of Machine Learning articles in my previous blogs but most of them were regression algorithms i.e. predicting continuous numbers. This time we shift gears a little and move over to predicting discrete classes.
This time we will discuss the working of the logistic regression algorithm.
Where regression algorithms predict continuous data e.g. the price of a particular good from a given set of features, Logistic Regression gives you class-based prediction e.g. whether a certain person is obese or not or has diabetes or not.
Such algorithms can be trained to predict multiple classes from a given data. However, for the sake of simplicity, we will discuss binary data classification in this tutorial.
Logistic Regression runs quite close to Linear Regression. If you are not familiar with Linear Regression, you can read about it here.
Linear Regression is all about fitting the straight line. Logistic Regression, however, bends things a little (pun intended 😉, you’ll get it in a while 😅).
The initial calculations are the same as that of Linear Regression, we have a set of input features denoted by X and some randomly initialized weights associated with them, θ here.
A dot product of the above 2 quantities forms the equation of a straight line. We can still fit this straight line to our binary classes and here’s how it will look like.
Since we have only 2 classes which will be denoted by 0 & 1 (Remember!! computer doesn’t understand the text so all string inputs will have to be encoded in numeric format), these are what our outputs look like. Now if we do run linear regression here and fit a straight line, we might get something reasonable.
That looks good enough, any data point to the left of the line is 0 (red) and any to the right is 1 (Green), with a marginal error of course. BUT WAIT!! what if our data isn’t so well distributed, what if we have outliers? Let’s see what effect that would have.
Now we have a problem. Due to the far lying data in class 1, the line leaned more towards the right and many points of class 1 are now misclassified.
The good news is, there is a solution to this. To classify this data correctly, all we have to do is…
So we have to add some sort of transformation which would change this straight line to a more…..bendy…. one. This is where we introduce the sigmoid function. Mathematically, the sigmoid function is represented by the following equation;
This function has the following shape;
This function is confined between 0 and 1 and intuitively speaking, it has a greater tendency to fit our binary data. The only difference between linear regression and logistic regression is this additional step of adding this sigmoid transformation. So now our data will go through the following steps;
The final output variable z is what our model will predict.
Like every Machine Learning model, we will initialize our model with random weights so for the curve to fit properly we will carry out the same steps as linear regression;
- Calculate output.
- Calculate loss (A measure of how far our predicted outputs are from the actual output)using the respective loss function.
- Update weights using the update equation to minimize the loss.
- Repeat steps 1–3 until the set no. of iterations is completed.
Let’s take a look at this loss function and the update equations.
For linear regression, we used the good’ol mean squared error function however, that will not fit here because our sigmoid function introduces non-linearity into the system and the mean squared error will not reach global minima.
I’ll skip the nitty-gritty details of the actual loss function and jump straight to the equation.
To find the minima of this loss function, we take the derivative of this function w.r.t to the given weight. Surprisingly, the derivative turns out to be exactly the same as that of the linear regression function.
The update equations, for all the weights used in our model, remain the same.
All we have to do after this is repeat the steps shown above until we have a reasonable fit.
This looks much better. For a fit like this, we can simply place rules like for values greater than 0.5, we give it a value of 1 and for below 0.5, 0.
That is it, that is the complete logistic regression algorithm explained.
Part 2 of this tutorial be about the python implementation of logistic regression.