I First of all, the library to be introduced
import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib matplotlib.rcParams['font.sans-serif']=['SimHei']#The purpose of displaying Chinese in bold is that the axis label and title of the figure made by matplotlib can be expressed in Chinese
II Import file
df=pd.read_csv('C:/Users/Administrator/AppData/Local/Temp/Temp2_machine-learning-ex1.zip/machine-learning-ex1/ex1/ex1data1.txt',header=None) #Import the file with column names 0 and 1 df.rename(columns={0:'city A population size',1:'city A Snack bar profit'},inplace=True) #Change column name df.head()#Show first five lines
Output results
First, make descriptive statistics (summary statistics) on the data to have a preliminary understanding of the data:
df.describe()
The result is
III Visualize data using scatter charts
plt.scatter(df['city A population size'],df['city A Snack bar profit']) # Set the chart title and label the axis plt.title('According to the urban population, the profit of snack bars in the city is predicted') plt.xlabel('Urban population') plt.ylabel('Profit of urban snack bar') plt.show()
The output result is:
Now the gradient descent algorithm is used to realize linear regression and minimize the cost function
IV Define cost function
Formula of cost function:
Define with code:
def computeCost(x,y,theta): inner = np.sum(np.power((x*theta-y),2)) return inner/(2*len(x)) #theta is the parameter
*1. np.power(a,2) refers to the power of array a element by element
2. np.sum(a) refers to summing the elements of array a
Because theta is a 2 * 1-dimensional vector and X is a len(x) * 1-dimensional vector, in order to operate, add a column before column 0 in X with all values of 1, so that x becomes a len(x) *2-dimensional matrix. At this point, x * theta is the hypothetical function.
The code is as follows:
df.insert(0,'ones',1)#It means to add a column before column 0 of df. The name of this column is' one ', and its values are all 1 df.head()
V Define input and output variables and parameters
x=df.loc[:,['ones','city A population size']] y=df.loc[:,'city A Snack bar profit'] X=np.matrix(x.values) y=np.matrix(y.values) theta=np.matrix([0,0])
Why use NP What about matrix
Because for matrix, point multiplication is directly X*Y, while for array, point multiplication is NP dot(X,Y)
When we defined the cost function earlier, we wrote x * theta instead of NP dot(x,theta)
x.values returns array
Vi Check the dimension of input and output variables and parameter vectors
X.shape y.shape theta.shape
So it needs to be adjusted
y=y.T #De transpose theta=theta.T
VII Define gradient descent algorithm
First, calculate the initial cost function
computeCost(X,y,theta) # 32.072733877455676
Define gradient algorithm:
def gradientDescent(x,y,theta,alpha,iters):#alpha is the learning rate and iters is the number of iterations temp = np.matrix(np.zeros(theta.shape))#initialization cost = np.zeros(iters) # initialization for i in range(iters): temp = theta - ((alpha/len(x))*(x*theta-y).T *x).T #Use vector form theta = temp # Update at the same time cost[i]=computeCost(x,y,theta) return theta,cost
1.np.zeros() generates an array with all values of 0
VIII Initialization learning rate and number of iterations
alpha=0.01 iters=1000 # I took 1000 here
IX Call gradient descent algorithm
finally_theta,cost=gradientDescent(X,Y,theta,alpha,iters)
As can be seen from the picture, the value of the cost function has been decreasing with the increase of the number of iterations
The cost function of the training model is calculated with the fitted parameters
computeCost(X,y,finally_theta) #4.515955503078912
X Draw and visualize the linear model
x=np.linspace(df['city A population size'].min(),df['city A population size'].max(),100)#Select the minimum and maximum values in the column of population of city A, and generate 100 numbers at equal intervals between the two values f=finally_theta[0,0]+finally_theta[1,0]*x fig,ax=plt.subplots(figsize=(8,4)) #Usage of subplots() ax.plot(x,f,'r',label='forecast') ax.scatter(df['city A population size'],df['city A Snack bar profit'],label='Training set') ax.legend(loc=2) #It is indicated on the figure to illustrate the text display of each curve ax.set_xlabel('city A population size') ax.set_ylabel('city A Snack bar profit') ax.set_title('According to the city A Population forecast city A Snack bar profit') plt.show()
The result is:
Supplement:
1. np.linspace() returns evenly spaced numbers within a specified interval
2. Pay attention to the difference between the two indexes:
3.plt.subplots() generates subgraphs
Xi Draw the curve of cost function
x=np.arange(1000) fig,ax=plt.subplots(figsize=(8,4)) ax.plot(x,cost,'r') ax.set_xlabel('Number of iterations') ax.set_ylabel('Cost function value') ax.set_title('gradient descent ')
The result is:
The square error cost function is a quadratic function of parameters, in line with!