# 1, Basic process of logistic regression

## 1. Logistic regression

Learn the law from multiple eigenvalues of a group of samples, establish a model (the learning model is abstracted as a formula f(x)), and use this model to predict the results of other samples. Logistic regression is to classify the prediction results.
Logistic regression steps:
(1) Normalize the value of x
(2) Write weight function z=w1x1+w2x2 +... + wixi
(3) Write activation function a= σ (z)
(4) Write loss function L=-ylog(a)-(1-y)log(1-a)
(5) Make gradient descent w=w- α dw
(6) Using the trained weight parameter w to make prediction

## 2. Activation function

Set the weight wi for each feature, and the calculated z=f(x) will have many values. If you want to classify the calculation result Z between 0 and 1, use the activation function sigmoid:y=1/(1+e^(-z)). At this time, the predicted value is obtained.
Characteristics of activation function: continuous and differentiable, nonlinear transformation.
If there are many functions (- 1,moz > 0), then reltanz = 0

## 3. Loss function

Function of loss function:
(1) Establish a relationship between the real value and the predicted value, L=-ylog(a)-(1-y)log(1-a)
(2) It can measure the gap between the real value and the predicted value. When y is equal to 0, the value of L can be calculated, and the gap is known
(3) The weights w and b can be deduced through the gap between a and y

## 4. Gradient descent and learning rate

The essence of the loss function is the function between L and W. the gradient descent can continuously find w that minimizes the loss function.
The learning rate controls the weight w and the step size of each iteration.

## 5. Normalization

(1)x'=(x-xmin)/(xmax-xmin)
(2)x '= (x-mean) / standard deviation
There are many normalization formulas. The purpose is to accelerate the convergence speed of the model and prevent model training from taking many detours.

# 2, Iris actual combat

Item description: according to the characteristic data statistics of the length and width of iris calyx, it is classified by logistic regression
Data characteristics: calyx length, calyx width
Category label: 0-mountain iris, 1-variegated iris, 2-virginia iris
Step 1: import numpy, matplotlib and dataset

```import numpy as np
import matplotlib.pyplot as plt
```

Step 2: visualize data and analyze data

```iris = load_iris()
iris.feature_names,
iris.target_names
iris.data/#Can see the specific information of characteristic data
iris.target/#You can see the value of the label of each row of data
##Take 100 samples and take the first two columns of features, calyx length and width
x=iris.data[0:100,0:2]
y=iris.target[0:100]
##Take the first two types of samples, 0 and 1 respectively
samples_0 = x[y==0, :]#Take out the sample with y=0
samples_1 = x[y==1, :]
#Scatter visualization
plt.scatter(samples_0[:,0],samples_0[:,1],marker='o',color='r')
plt.scatter(samples_1[:,0],samples_1[:,1],marker='x',color='b')
plt.xlabel('X')
plt.ylabel('Y')
``` Step 3: split data, 80 training data and 20 test data

```x_train=np.vstack([x[:40,:],x[60:100,:]])#Take the data of the first 40 and the last 40
y_train=np.concatenate([y[:40],y[60:100]])
x_test=x[40:60,:]
y_test=y[40:60]
```

Step 4: implementation of logistic regression algorithm

```class Logistic_Regression():
def __init__(self):
self.w=None
def sigmoid(self,z):
a=1/(1+ np.exp(-z))
return a
def output(self,x):
z=np.dot(self.w,x.T)
a=self.sigmoid(z)
return a
def compute_loss(self,x,y):
num_train=x.shape
a=self.output(x)
loss=np.sum(-y*np.log(a)-(1-y)*np.log(1-a))/num_train
dw=np.dot((a-y),x)/num_train
return loss,dw
def train(self,x,y,learning_rate=0.01,num_iterations=10000):
num_train,num_features=x.shape
self.w=0.001*np.random.randn(1,num_features)
loss=[]
for i in range(num_iterations):
error,dw=self.compute_loss(x,y)
loss.append(error)
self.w-=learning_rate*dw
if i%200==0:
print('steps:[%d/%d],loss:%f'%(i,num_iterations,error))
return loss
def predict(self,x):
a=self.output(x)
y_pred=np.where(a>=0.5,1,0)
return y_pred
```

Step 5: create lr instance and train model

```lr=Logistic_Regression()
loss=lr.train(x_train,y_train)
plt.plot(loss)
##Decision boundary visualization
plt.scatter(samples_0[:,0],samples_0[:,1],marker='o',color='r')
plt.scatter(samples_1[:,0],samples_1[:,1],marker='x',color='b')
plt.xlabel('x')
plt.ylabel('y')
x1=np.arange(4,7.5,0.05)
x2=(-lr.w*x1)/lr.w
#sigmoid=1/(1+np.exp(-x))
#x1*w1+x2*w2=0
plt.plot(x1,x2,'-',color='black')
```

Step 6: prediction on test set

```num_test=x_test.shape
prediction=lr.predict(x_test)
accuracy=np.sum(prediction==y_test)/num_test
print(r'the accuracy of prediction is :', accuracy)
```

Posted by McMaster on Sun, 22 May 2022 15:13:52 +0300