Introduction:

This blog helps to understand the basic fundamentals of linear regression. The blog explains the mathematics behind linear regression using a simple linear equation such as Y = mX + c. The blog is separated into two parts: (1) simple linear regression and (2) multivariant linear regression.

First, you need to understand why linear regression is helpful in such cases as

For simple single variable linear regression:

Employees' salary prediction based on experience.
Vehicles' top speed based on their market values.
Power consumption based on a number of family members and many more.

For multivariant linear regression:

House pricing prediction using carpet area, area code and etc.
Future population Estimation using parameters of current lifestyle.
Company growth based on expense, market conditions, and many more.

Let's start sequentially, we will understand the simple linear regression and then multivariant linear regression.

1. Simple Linear Regression.

Simple linear regression is just one variable prediction where we can pass a single feature and estimate the output using two coefficients. The Basic equation for linear regression is:

Y = mX + c

Where m and c are coefficients. Y is the estimated output and X is an input value.

Here, m and c are learning parameters that can be changed over time during training sessions.

The core functionality of simple linear regression is divided into 2 parts:

calculate the mean and covariance of the input feature.
estimate coefficients.

Calculate mean, variance, and covariance of the input feature:

We need to find a total of three parameters:

mean:
`mean = {\frac {1}{n}}\sum _{i=1}^{n}x_{i}`
variance:
`variance = \sum_{i=0}^n (x_{i} - \overline{x})^{2}`
covariance:
`covariance = \sum_{i=0}^n (x_{i} - \overline{x}) * (y_{i} - \overline{y})`

The above equations can be written as function in the python program as follows:

mean:
mu = lambda x : sum(x) / float(len(x))
variance:
var = lambda x, mean : sum([(val-mean)**2 for val in x])
covariance:
def calculate_covariance(x, y, mean_x, mean_y):
covarince = 0
for xi, yi in zip(x,y):
covarince += (xi - mean_x) * (yi-mean_y)
return covarince

Estimate coefficients:

We can find coefficients m and c using the above functions. As per the linear equation terminology, we can call m the slope of the line, and c is the bias or the origin point on the x-axis.

The slope can be calculated as follows:

`Slope = \frac{covariance}{x_{variance}}`

Bias can be found very easily as we know the line equation Y = slope*X + bias. Make bias as a target and we will get the equation as bias = Y - slope*X.

The python implementation can be written as follow to estimate slope and bias value.

def estimate_coefficient(covariance, variance_x, mean_x, mean_y):
slope = covariance/variance_x
bias = mean_y - (slope * mean_x)

return slope, bias

After getting the slope and bias values, the estimation can be performed using the same line equation(Y = mX + c) where we can put slope and bias values along with the feature value to get its estimation.

Multivariant Linear Regression.

The multivariant linear regression model uses multiple features to estimate continuous prediction. For example, we need to predict the opening value of a specific stock using multiple features such as closing price, opening price, avg price, and more for the previous day.

There are 2 sections that can be implemented in the multivariance linear regression.

Estimator.
Stochastic gradient descent.

1. Estimator

The estimator is a simple linear function with multiple features multiplied by their respective weights. Also, these multiplications can be summed along with a bias value.

So, The estimator function can be visualized in mathematical form as follows:

`y = b + \sum_0^n w_{i}*x_{i}`

The above linear function can be written as a function in the python program as follows:

def estimation(x,w):
return w[0] + sum([wi*xi for wi,xi in zip(w[1:],x)])

Where x and w are scaler vectors of a 1D array that contain all feature values and feature importance respectively.

2. Stochastic gradient descent

The stochastic gradient descent method can be broken down into 3 steps (1) run estimation (2) calculate error (3) update weights according to the generated error.

we can use the estimation function from the estimator. The output of the estimator can be used to calculate the error by comparing it with actual values. The error can be used to update weights by applying multiplication of error and learning rate.

The error can be calculated as follow:

`Err = (y_{pred} - y)^{2}`

To update weights, the effect of change in every feature can be subtracted from the current value of weights by a predefined rate.

`w_{i} := w_{i} - lr * Err * x_{i}`

`b_{i} := b_{i} - lr * Err`

The above functions can be written in a function in the python program as follows:

def stochastic_gradient_descent(train_X, train_y, learning_rate, epochs):

weights = [0.0 for i in range(len(train_X[0]))]

for epoch in range(epochs):

sum_error = 0

for X,Y in zip(train_X, train_y):

pred = estimation(X, weights)

error = ((pred - Y) ** 2)

sum_error += error

weights[0] = weights[0] - learning_rate * error

for i in range(len(X)-1):

weights[i + 1] = weights[i + 1] - learning_rate * error * X[i]

print( ' >epoch=%d, lrate=%.3f, error=%.3f ' % (epoch, learning_rate, sum_error))

return weights

Introduction

There are multiple blogs available about Machine learning models specifically for KNN(K-nearest neighbor). But, Most blogs explain theory with library code. This blog contains a KNN algorithm description with code snippets. Given code snippets can help to understand the written theory of KNN.

A question can be raised whether KNN can be used for the classification or regression?
So, the answer is, KNN can be used as a classification and regression model.

KNN for classification problem

Iris Dataset has been used to train the model as well as for testing.

There are mainly two core functionalities for implementation.

Calculate the distance between two data points.
Extract k data points by sorting the distance.

1. Calculate distance between two data points:

There are multiple methods to calculate distance between two points such as Minkowski Distance, Manhattan Distance, Euclidean Distance, Cosine Distance, Jaccard Distance, and Hamming Distance. The blog is not going to cover when to use them. But, to understand in detail, This blog can be very useful.

For simplicity, euclidean distance can be used as it is a very common distance calculation algorithm.

The Euclidean distance algorithm can be used to calculate the distance between two multidimensional points.

`D\left(p_{1},p_{2}\right) = \sqrt{\sum_{i=1}^n (p_{2_{I}} - p_{1_{I}})^{2}}`

Where,

n = number of features

`p_{1}, p_{2}`= selected two data points

The python code snippet is given below:

    def calculate_euclidean_distance(data_point_1, data_point_2):
    	distance = 0.0
    	for i in range(len(data_point_1)):
        	distance += (data_point_1[i] - data_point_2[i]) ** 2
    	return np.sqrt(distance)

2. Extract k data points by sorting the distance.

This functionality is very basic. The distance between the current point and all training data points need to be calculated along with sorting them. Top k points with less distance can be extracted to check with training points labels. The label with maximum count can be considered as a prediction result.

    def get_k_nearest_points_index(user_point, neighbor_points, k):
    	neighbor_distances = []
    	for idx, neighbor_point in enumerate(neighbor_points):
        	distance = calculate_euclidean_distance(user_point, neighbor_point)
        	neighbor_distances.append((idx, distance))
    	neighbor_distances.sort(key=lambda n: n[1])
    	return neighbor_distances[:k]

After applying the above two functionalities, We have k nearest points indices of training data. Label of nearest points can be extracted using the given training data indices. The maximum count of extracted labels can be considered as a prediction class.

One question may be raised What if nearest k points are non-repetitive values then how does maximum count work?

Sorting by points distance is already done and if all points are non-repetitive then the first point can be considered as a prediction result because it is the nearest point.

    def predict_for_classification(test_point, train_points, train_labels, k):
	k_nearest_point_indices = get_k_nearest_points_index(test_point,
								train_points, k)
  	k_labels = list(dict(k_nearest_point_indices).keys())
  	output_label = max(k_labels, key = k_labels.count)
  	return train_labels[output_label]

KNN for regression:

The KNN regression problem is simple if one has understood the classification very well. The main change is instead of extracting the maximum count of labels. The average operation is performed on the continuous label values.

Two steps are the same as per the above such as:

Calculate the distance between two data points.
Extract k data points by sorting the distance.

Then, the prediction for regression is as follow:

    def predict_for_regression(test_point, train_points, train_labels, k):
	k_nearest_point_indices = get_k_nearest_points_index(test_point,
    								train_points, k)
	k_labels = list(dict(k_nearest_point_indices).keys())
    
	summation = 0.0
	for label in k_labels:
		summation += train_labels[label]
	return summation/len(k_labels)

Conclusion:

The KNN model is a really basic algorithm distance calculation and predicts behavior the same as its neighbors. It can be used for classification and regression as well. The common functionalities are distance calculation and sorting distance to extract k nearest points. There are multiple distance calculation algorithms as given. The difference between classification and regression is, classification extracts the max count of nearest labels, and regression average out continuous label values.

Machine Learning from scratch

Tuesday, April 26, 2022

Linear Regression