# Machine learning python -- univariate linear regression model (Wu Enda's homework)

## I First of all, the library to be introduced

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.sans-serif']=['SimHei']#The purpose of displaying Chinese in bold is that the axis label and title of the figure made by matplotlib can be expressed in Chinese
```

## II Import file

```df=pd.read_csv('C:/Users/Administrator/AppData/Local/Temp/Temp2_machine-learning-ex1.zip/machine-learning-ex1/ex1/ex1data1.txt',header=None) #Import the file with column names 0 and 1
df.rename(columns={0:'city A population size',1:'city A Snack bar profit'},inplace=True) #Change column name
```

Output results First, make descriptive statistics (summary statistics) on the data to have a preliminary understanding of the data:

```df.describe()
```

The result is ## III Visualize data using scatter charts

```plt.scatter(df['city A population size'],df['city A Snack bar profit'])

# Set the chart title and label the axis
plt.title('According to the urban population, the profit of snack bars in the city is predicted')
plt.xlabel('Urban population')
plt.ylabel('Profit of urban snack bar')

plt.show()
```

The output result is: Now the gradient descent algorithm is used to realize linear regression and minimize the cost function

## IV Define cost function

Formula of cost function: Define with code:

```def computeCost(x,y,theta):
inner = np.sum(np.power((x*theta-y),2))
return inner/(2*len(x)) #theta is the parameter
```

*1. np.power(a,2) refers to the power of array a element by element
2. np.sum(a) refers to summing the elements of array a

Because theta is a 2 * 1-dimensional vector and X is a len(x) * 1-dimensional vector, in order to operate, add a column before column 0 in X with all values of 1, so that x becomes a len(x) *2-dimensional matrix. At this point, x * theta is the hypothetical function.
The code is as follows:

```df.insert(0,'ones',1)#It means to add a column before column 0 of df. The name of this column is' one ', and its values are all 1
``` ## V Define input and output variables and parameters

```x=df.loc[:,['ones','city A population size']]
y=df.loc[:,'city A Snack bar profit']
X=np.matrix(x.values)
y=np.matrix(y.values)
theta=np.matrix([0,0])
```  Why use NP What about matrix
Because for matrix, point multiplication is directly X*Y, while for array, point multiplication is NP dot(X,Y)
When we defined the cost function earlier, we wrote x * theta instead of NP dot(x,theta)

x.values returns array ## Vi Check the dimension of input and output variables and parameter vectors

```X.shape
y.shape
theta.shape
``` So it needs to be adjusted

```y=y.T  #De transpose
theta=theta.T
``` ## VII Define gradient descent algorithm

First, calculate the initial cost function

```computeCost(X,y,theta)  # 32.072733877455676
``` ```def gradientDescent(x,y,theta,alpha,iters):#alpha is the learning rate and iters is the number of iterations
temp = np.matrix(np.zeros(theta.shape))#initialization
cost = np.zeros(iters) # initialization
for i in range(iters):
temp = theta - ((alpha/len(x))*(x*theta-y).T *x).T #Use vector form
theta = temp  # Update at the same time
cost[i]=computeCost(x,y,theta)

return theta,cost
```

1.np.zeros() generates an array with all values of 0

## VIII Initialization learning rate and number of iterations

```alpha=0.01
iters=1000  # I took 1000 here
```

## IX Call gradient descent algorithm

```finally_theta,cost=gradientDescent(X,Y,theta,alpha,iters)
```  As can be seen from the picture, the value of the cost function has been decreasing with the increase of the number of iterations
The cost function of the training model is calculated with the fitted parameters

```computeCost(X,y,finally_theta) #4.515955503078912
```

## X Draw and visualize the linear model

```x=np.linspace(df['city A population size'].min(),df['city A population size'].max(),100)#Select the minimum and maximum values in the column of population of city A, and generate 100 numbers at equal intervals between the two values
f=finally_theta[0,0]+finally_theta[1,0]*x
fig,ax=plt.subplots(figsize=(8,4))  #Usage of subplots()
ax.plot(x,f,'r',label='forecast')
ax.scatter(df['city A population size'],df['city A Snack bar profit'],label='Training set')
ax.legend(loc=2)   #It is indicated on the figure to illustrate the text display of each curve
ax.set_xlabel('city A population size')
ax.set_ylabel('city A Snack bar profit')
ax.set_title('According to the city A Population forecast city A Snack bar profit')
plt.show()
```

The result is: Supplement:
1. np.linspace() returns evenly spaced numbers within a specified interval
2. Pay attention to the difference between the two indexes: 3.plt.subplots() generates subgraphs

## Xi Draw the curve of cost function

```x=np.arange(1000)
fig,ax=plt.subplots(figsize=(8,4))
ax.plot(x,cost,'r')
ax.set_xlabel('Number of iterations')
ax.set_ylabel('Cost function value')
The result is: 