Basic process of logistic regression and actual combat of iris project

1, Basic process of logistic regression

1. Logistic regression

Learn the law from multiple eigenvalues of a group of samples, establish a model (the learning model is abstracted as a formula f(x)), and use this model to predict the results of other samples. Logistic regression is to classify the prediction results.
Logistic regression steps:
(1) Normalize the value of x
(2) Write weight function z=w1x1+w2x2 +... + wixi
(3) Write activation function a= σ (z)
(4) Write loss function L=-ylog(a)-(1-y)log(1-a)
(5) Make gradient descent w=w- α dw
(6) Using the trained weight parameter w to make prediction

2. Activation function

Set the weight wi for each feature, and the calculated z=f(x) will have many values. If you want to classify the calculation result Z between 0 and 1, use the activation function sigmoid:y=1/(1+e^(-z)). At this time, the predicted value is obtained.
Characteristics of activation function: continuous and differentiable, nonlinear transformation.
If there are many functions (- 1,moz > 0), then reltanz = 0

3. Loss function

Function of loss function:
(1) Establish a relationship between the real value and the predicted value, L=-ylog(a)-(1-y)log(1-a)
(2) It can measure the gap between the real value and the predicted value. When y is equal to 0, the value of L can be calculated, and the gap is known
(3) The weights w and b can be deduced through the gap between a and y

4. Gradient descent and learning rate

The essence of the loss function is the function between L and W. the gradient descent can continuously find w that minimizes the loss function.
The learning rate controls the weight w and the step size of each iteration.

5. Normalization

(1)x'=(x-xmin)/(xmax-xmin)
(2)x '= (x-mean) / standard deviation
There are many normalization formulas. The purpose is to accelerate the convergence speed of the model and prevent model training from taking many detours.

2, Iris actual combat

Item description: according to the characteristic data statistics of the length and width of iris calyx, it is classified by logistic regression
Data characteristics: calyx length, calyx width
Category label: 0-mountain iris, 1-variegated iris, 2-virginia iris
Step 1: import numpy, matplotlib and dataset

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

Step 2: visualize data and analyze data

iris = load_iris()
iris.feature_names,
iris.target_names
iris.data/#Can see the specific information of characteristic data
iris.target/#You can see the value of the label of each row of data
##Take 100 samples and take the first two columns of features, calyx length and width
x=iris.data[0:100,0:2]
y=iris.target[0:100]
##Take the first two types of samples, 0 and 1 respectively
samples_0 = x[y==0, :]#Take out the sample with y=0
samples_1 = x[y==1, :]
#Scatter visualization
plt.scatter(samples_0[:,0],samples_0[:,1],marker='o',color='r')
plt.scatter(samples_1[:,0],samples_1[:,1],marker='x',color='b')
plt.xlabel('X')
plt.ylabel('Y')


Step 3: split data, 80 training data and 20 test data

x_train=np.vstack([x[:40,:],x[60:100,:]])#Take the data of the first 40 and the last 40
y_train=np.concatenate([y[:40],y[60:100]])
x_test=x[40:60,:]
y_test=y[40:60]

Step 4: implementation of logistic regression algorithm

class Logistic_Regression():
    def __init__(self):
        self.w=None
    def sigmoid(self,z):
        a=1/(1+ np.exp(-z))
        return a
    def output(self,x):
        z=np.dot(self.w,x.T)
        a=self.sigmoid(z)
        return a
    def compute_loss(self,x,y):
        num_train=x.shape[0]
        a=self.output(x)
        loss=np.sum(-y*np.log(a)-(1-y)*np.log(1-a))/num_train
        dw=np.dot((a-y),x)/num_train
        return loss,dw
    def train(self,x,y,learning_rate=0.01,num_iterations=10000):
        num_train,num_features=x.shape
        self.w=0.001*np.random.randn(1,num_features)
        loss=[]
        for i in range(num_iterations):
            error,dw=self.compute_loss(x,y)
            loss.append(error)
            self.w-=learning_rate*dw
            if i%200==0:
                print('steps:[%d/%d],loss:%f'%(i,num_iterations,error))
        return loss
    def predict(self,x):
        a=self.output(x)
        y_pred=np.where(a>=0.5,1,0)
        return y_pred

Step 5: create lr instance and train model

lr=Logistic_Regression()
loss=lr.train(x_train,y_train)
plt.plot(loss)
##Decision boundary visualization
plt.scatter(samples_0[:,0],samples_0[:,1],marker='o',color='r')
plt.scatter(samples_1[:,0],samples_1[:,1],marker='x',color='b')
plt.xlabel('x')
plt.ylabel('y')
x1=np.arange(4,7.5,0.05)
x2=(-lr.w[0][0]*x1)/lr.w[0][1]
#sigmoid=1/(1+np.exp(-x))
#x1*w1+x2*w2=0
plt.plot(x1,x2,'-',color='black')

Step 6: prediction on test set

num_test=x_test.shape[0]
prediction=lr.predict(x_test)
accuracy=np.sum(prediction==y_test)/num_test
print(r'the accuracy of prediction is :', accuracy)

Tags: Python Machine Learning logistic regressive

Posted by McMaster on Sun, 22 May 2022 15:13:52 +0300