Python Machine Learning Linear Regression

Machine Learning Linear Regression with Python

This is an amazing code example of Machine Learning with a real world application.

*Featured Code submitted and Created by Kuba Siekierzynski

Python

"""

Machine Learning with Python

Linear regression

The code is the first of the series inspired by the awesome Andrew Ng's video course on machine learning:

http://www.coursera.org/learn/machine-learning

This code shows an implementation of linear regression, the initial machine learning algorithm to predict continuous values.

"""

# The problem to be solved:

# We have trucks located in different cities and each truck brings a profit or loss. We have the historical data and determined that the profit depends on the city's population. We want to find this relation.

import numpy as np

print('Welcome to Machine Learning with Python!')

print('Lesson 1: Linear regression')

print('\n'+40*'=')

# data contains the city population (in 10,000s) in the first column

# and the profit/loss (in 10,000$) in the second columns

# the data was rescaled to save on calculations and resources consumption

# Based on the first entry, a truck in a city of population of 61,101 brought a profit of $175,920

data =\

[

[6.1101,17.592],

[5.5277,9.1302],

[8.5186,13.662],

[7.0032,11.854],

[5.8598,6.8233],

[8.3829,11.886],

[7.4764,4.3483],

[8.5781,12],

[6.4862,6.5987],

[5.0546,3.8166],

[5.7107,3.2522],

[14.164,15.505],

[5.734,3.1551],

[8.4084,7.2258],

[5.6407,0.71618],

[5.3794,3.5129],

[6.3654,5.3048],

[5.1301,0.56077],

[6.4296,3.6518],

[7.0708,5.3893],

[6.1891,3.1386],

[20.27,21.767],

[5.4901,4.263],

[6.3261,5.1875],

[5.5649,3.0825],

[18.945,22.638],

[12.828,13.501],

[10.957,7.0467],

[13.176,14.692],

[22.203,24.147],

[5.2524,-1.22],

[6.5894,5.9966],

[9.2482,12.134],

[5.8918,1.8495],

[8.2111,6.5426],

[7.9334,4.5623],

[8.0959,4.1164],

[5.6063,3.3928],

[12.836,10.117],

[6.3534,5.4974],

[5.4069,0.55657],

[6.8825,3.9115],

[11.708,5.3854],

[5.7737,2.4406],

[7.8247,6.7318],

[7.0931,1.0463],

[5.0702,5.1337],

[5.8014,1.844],

[11.7,8.0043],

[5.5416,1.0179],

[7.5402,6.7504],

[5.3077,1.8396],

[7.4239,4.2885],

[7.6031,4.9981],

[6.3328,1.4233],

[6.3589,-1.4211],

[6.2742,2.4756],

[5.6397,4.6042],

[9.3102,3.9624],

[9.4536,5.4141],

[8.8254,5.1694],

[5.1793,-0.74279],

[21.279,17.929],

[14.908,12.054],

[18.959,17.054],

[7.2182,4.8852],

[8.2951,5.7442],

[10.236,7.7754],

[5.4994,1.0173],

[20.341,20.992],

[10.136,6.6799],

[7.3345,4.0259],

[6.0062,1.2784],

[7.2259,3.3411],

[5.0269,-2.6807],

[6.5479,0.29678],

[7.5386,3.8845],

[5.0365,5.7014],

[10.274,6.7526],

[5.1077,2.0576],

[5.7292,0.47953],

[5.1884,0.20421],

[6.3557,0.67861],

[9.7687,7.5435],

[6.5159,5.3436],

[8.5172,4.2415],

[9.1802,6.7981],

[6.002,0.92695],

[5.5204,0.152],

[5.0594,2.8214],

[5.7077,1.8451],

[7.6366,4.2959],

[5.8707,7.2029],

[5.3054,1.9869],

[8.2934,0.14454],

[13.394,9.0551],

[5.4369,0.61705]

]

# We want to make a model able to predict the profit/loss, based on a given population. In order to do some machine learning, the data has to be of a matrix type.

# X matrix will hold city population

X = np.matrix(data)[:,0]

# y matrix will hold the profit/loss information

y = np.matrix(data)[:,1]

'''

Basically, we are looking for a function f(x) returning the _output_ value y based on its _input_ x. We assume a linear y = ax + b dependence, but it as well might have been a polynominal or any other function. So, we are looking for such a and b values that give us a function that will somehow reflect the profit based on the population. Like this:

predicted_profit = a * city_population + b

A quick look at the data shows that it is impossible to find a line which would cross all the datapoints. So, we want to have the best possible fit. How do we measure the quality of it? The best possible fit is such that makes the smallest prediction error on the whole dataset. The single error is calculated as the square of the difference between the real and predicted value, so the total error will simply be the sum of all single ones.

We thus need a so-called cost function which would return the average error of a given f(x) when trying to explain the datapoints and make predictions. In order to make things quicker, we will look for a vector 'theta', containing the 'a' and 'b' (or more, for more complicated models - theta0, theta1, theta2,...) parameters.

'''

print('\nLooking for y=a*x+b function (a,b=theta)')

# function J calculates the cost under a given set of theta parameters

def J(X, y, theta):

theta = np.matrix(theta).T # we need a transposed matrix theta

m = len(y) # m is the number of datapoints

predictions = X * theta # stores the outputs predicted by f(x) with a given theta as parameter vector

sqError = np.power((predictions-y),[2]) # a matrix of squared errors between predictions and real values

return 1/(2*m) * sum(sqError) # the value of the cost function J

# the transformation below adds a column of ones to the left of the X matrix, for calculation reasons

dataX = np.matrix(data)[:,0:1]

X = np.ones((len(dataX),2))

X[:,1:] = dataX

# let's check the cost if we would assume theta at two different values

print('\nChecking two example cases of theta:')

for t in [0,0], [-1,2]:

print('Assuming theta vector at {}, the cost would be {:.2f}'.format(t, J(X, y, t).item())) # 32.073, 54.242

'''

Now, how to find the optimal theta vector for our model to predict with the smallest possible error?

Assuming that J is a cost function, this is an optimization problem - we need to find the minimum of J.

We will use a technique called gradient descent - we will initialize theta at all-zeros and gradually move along the J curve updating all thetas (simultaneously) by small fractions. If J increases - we are going the wrong way, if it decreases - we are moving along this way.

'''

# gradient descent function will iteratively update theta by a small fraction alpha (also called the learning rate) for a number of iterations

def gradient(X, y, alpha, theta, iters):

J_history = np.zeros(iters) # will store historical values of J for each iteration

m = len(y) # m is the number of datapoints

theta = np.matrix(theta).T # theta has to be transposed again

for i in range(iters):

h0 = X * theta # zero hypothesis for each datapoint

delta = (1 / m) * (X.T * h0 - X.T * y) # the gradient descent

theta = theta - alpha * delta # update theta by learning rate times gradient

J_history[i] = J(X, y, theta.T) # save the J of a particular iteration, it should drop in the next

return J_history, theta # return the history of J plus the optimal theta

print('\n'+40*'=')

# we have the function ready, let's do some machine learning!

theta = np.matrix([np.random.random(),np.random.random()]) # we initialize theta at random values

alpha = 0.01 # learning rate - if too low, the algorithm will not converge, if too high, it can "explode"

iters = 2000 # number of iterations - reduce if "Time limit exceeded"

print('\n== Model summary ==\nLearning rate: {}\nIterations: {}\nInitial theta: {}\nInitial J: {:.2f}\n'.format(alpha, iters, theta, J(X,y,theta).item()))

print('Training the model... ')

# this actually trains our model and finds the optimal theta value

J_history, theta_min = gradient(X, y, alpha, theta, iters)

print('Done.')

print('\nFinal theta: {}\nFinal J: {:.2f}'.format(theta_min.T, J(X,y,theta_min.T).item()))

'''

Now that we have the model trained, we can use it to predict the profit/loss

Usually, since we want to solve a real problem, we define our function to accept real numbers, not rescaled ones. However, we have to remember, that the model itself is trained on rescaled data, so we have to provide it.

'''

# This function will calculate the predicted profit

def predict_profit(population):

pop = population / 10000

return [1, pop] * theta_min * 10000

# Now, let's check for a random city

p = 50000 + 100000 * np.random.random()

print('\n'+40*'=')

print('\nBased on learned data, predicted profit for a city of population of {:,.0f} is ${:,.2f}.\n'.format(p, predict_profit(p).item()))

# For the business decision, it would also be good to know what is the minimal population of a city to start the profitable business (predicted value is at least positive)

p_min = -theta_min[0].item() / theta_min[1].item() * 10000

print('In order for the business to be profitable, it has to be started in a city with population greater than {:,.0f}.'.format(p_min))

print('\n'+40*'=')

print('\nNOTE: The code initializes the model with different theta each time, thus the model predicts different minimal viable population at each runtime.')