[deep learning experiment] the second time: analysis and prediction of influencing factors of fiscal revenue

Relevant knowledge

Set variable X ( 0 ) = { X ( 0 ) ( i ) , i = 1 , 2 , . . . , n } X^{(0)}=\{X^{(0)}(i),i=1,2,...,n\} X(0)={X(0)(i),i=1,2,...,n} is a nonnegative monotone original data sequence, and a grey prediction model is established: firstly X ( 0 ) X^{(0)} X is accumulated once X ( 1 ) = { X ( 1 ) ( K ) , k = 1 , 2 , . . . , n } X^{(1)}=\{X^{(1)}(K),k=1,2,...,n\} X(1)={X(1)(K),k=1,2,...,n}. yes X ( 1 ) X^{(1)} X(1) can establish the following first-order linear differential equation:

d X ( 1 ) d t + a X ( 1 ) = u \frac{dX^{(1)}}{d_t}+aX^{(1)}=u dt​dX(1)​+aX(1)=u

GM(1,1) model.

By solving the differential equation, the prediction model is as follows:

X ^ ( 1 ) ( k + 1 ) = [ X ^ ( 1 ) ( 0 ) − u ^ a ^ ] e − a ^ k + u ^ a ^ \hat{X}^{(1)}(k+1)=[\hat{X}^{(1)}(0)-\frac{\hat{u}}{\hat{a}}]e^{-\hat{a}k}+\frac{\hat{u}}{\hat{a}} X^(1)(k+1)=[X^(1)(0)−a^u^​]e−a^k+a^u^​

Since the GM(1,1) model obtains a one-time accumulation, the data obtained by GM(1,1) model X ^ ( 1 ) ( k + 1 ) \hat{X}^{(1)}(k+1) X^(1)(k+1) is reduced to X ^ ( 0 ) ( k + 1 ) \hat{X}^{(0)}(k+1) X^(0)(k+1), i.e X ( 0 ) X^{(0)} The grey prediction model of X(0) is:

X ^ ( 0 ) ( k + 1 ) = ( e − a ^ − 1 ) [ X ^ ( 0 ) ( n ) − u ^ a ^ ] e − a ^ k \hat{X}^{(0)}(k+1)=(e^{-\hat{a}}-1)[\hat{X}^{(0)}(n)-\frac{\hat{u}}{\hat{a}}]e^{-\hat{a}k} X^(0)(k+1)=(e−a^−1)[X^(0)(n)−a^u^​]e−a^k

The following is the accuracy test table of posterior error model:

Task steps

First, download the data required for the experiment

Link: https://pan.baidu.com/s/1bEwkedw5wbAH3ztoGTXFJQ
Extraction code: ygtl

Download the extract and extract it locally

Official documents show that the interface ttk of Tkinter and the new theme widget since Python 3.1 have been included in the python standard library. My experimental environment is Python 3.0 under Windows 6 (virtual environment under anaconda3), so the content in the original report [Ubuntu environment] is omitted

sudo apt-get update
sudo apt-get install python3-tk

Data exploration and analysis

There are many factors affecting fiscal revenue (Y). Through the interpretation of fiscal revenue by economic theory and the observation of practice, we can get the following relevant factors: the number of social employees, the total wages of on-the-job employees, the total retail sales of social consumer goods, the disposable income of urban residents, the per capita consumption expenditure of urban residents, the investment in fixed assets of the whole society, the total population at the end of the year, the regional GDP Output value of primary industry, tax, consumer price index, output value ratio of tertiary industry to secondary industry, and consumption level of residents.

Descriptive analysis

Open terminal, and the dependent packages required for this experiment are as follows:

pip install pandas==0.23.4
pip install xlrd==1.2.0
pip install xlwt==1.3.0
pip install keras==2.2.4
pip install tensorflow==1.3.0
pip install matplotlib==2.0.2

Firstly, the descriptive statistical analysis of the data is carried out to obtain the overall understanding of the data. The code is as follows, append it to gaikuo Py file.

import numpy as np
import pandas as pd

inputfile = 'finance/data1.csv'
data = pd.read_csv(inputfile)
r = [data.min(), data.max(), data.mean(), data.std()]
r = pd.DataFrame(r, index=['Min', 'Max', 'Mean', 'STD']).T
r = np.round(r, 2)
print(r)

The mean and standard deviation of fiscal revenue (y) obtained from the above are 618.08 and 609.25 respectively This shows that there are great differences in the fiscal revenue of a city in each year. After 2008, the fiscal revenue of the city in each year increased significantly.

correlation analysis

The correlation coefficient can be used to describe the relationship between quantitative variables and explanatory variables, and preliminarily judge whether there is linear correlation between dependent variables and explanatory variables.

Create a new python file named correlation py

import numpy as np
import pandas as pd

inputfile = 'finance/data1.csv'
data = pd.read_csv(inputfile)
print(np.round(data.corr(method='pearson'), 2))

The operation results are as follows:

It can be seen from the above that the linear relationship between consumer price index (x11) and fiscal revenue is not significant, and there is a negative correlation. Other variables have a high positive correlation with fiscal revenue.

model building

Fiscal revenue and forecast models of various types of revenue

Prediction model of fiscal revenue in a city

A combined prediction model of grey prediction and neural network is established for the factors affecting fiscal revenue. Since Python and the extension library do not provide the function of grey prediction, the grey prediction function gm() is written to predict the local fiscal revenue. The factors used to predict the impact of fiscal revenue include the number of social employees (x1), the total wages of on-the-job employees (x2), the total retail sales of social consumption (x3), the per capita disposable income of urban residents (x4), the per capita consumption expenditure of urban residents (x5) and the investment in fixed assets (x7). The values of 2014 and 2015 are obtained through the gray prediction model established by python.

Create a new python file named 1-huise py

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

inputfile = 'finance/data1.csv'  # Input data file
outputfile = 'finance/data1_GM11.xls'  # Path saved after grey prediction
data = pd.read_csv(inputfile)  # Read data
data.index = range(1994, 2014)
data.loc[2014] = None
data.loc[2015] = None


def gm(x0):  # Custom grey prediction function
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)  # Calculation parameters
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


l = ['x1', 'x2', 'x3', 'x4', 'x5', 'x7']
for i in l:
	f = gm(data[i][:len(data) - 2].values)[0]
	data[i][2014] = f(len(data) - 1)  # 2014 forecast results
	data[i][2015] = f(len(data))  # 2015 forecast results
	data[i] = data[i].round(2)  # Keep six decimal places
data[l + ['y']].to_excel(outputfile)  # Result output

A neural network prediction model is established for the factors affecting fiscal revenue. Its parameter setting is error accuracy 1 0 − 7 10^{-7} 10 − 7, 10000 learning times (6 nodes of input layer, 12 nodes of hidden layer and 1 node of output layer).

Create a new python file named: 1-yuce py

The neural network prediction code is as follows, which is added to 1-yuce Py file:

import pandas as pd

inputfile = 'finance/data1_GM11.xls'
outputfile = 'finance/revenue.xls'
modelfile = 'finance/1-net.model'
data = pd.read_excel(inputfile)
feature = ['x1', 'x2', 'x3', 'x4', 'x5', 'x7']
data_train = data.loc[range(1994, 2014)].copy()
data_mean = data_train.mean()
data_std = data_train.std()
data_train = (data_train - data_mean) / data_std
x_train = data_train[feature].as_matrix()
y_train = data_train['y'].as_matrix()
from keras.models import Sequential
from keras.layers.core import Dense, Activation

net_file = 'net.model'
net = Sequential()
net.add(Dense(12, input_shape=(6,)))
net.add(Activation('relu'))
net.add(Dense(1, input_shape=(12,)))
net.compile(loss='mean_squared_error', optimizer='adam')  # Compiling model
net.fit(x_train, y_train, nb_epoch=10000, batch_size=16)  # Training model, learning 10000 times
net.save_weights(modelfile)  # Save model parameters
x = ((data[feature] - data_mean[feature]) / data_std[feature]).as_matrix()
data['y_pred'] = net.predict(x) * data_std['y'] + data_mean['y']
data.to_excel(outputfile)
print(data)
import matplotlib.pyplot as plt

p = data[['y', 'y_pred']].plot(subplots=True, style=['b-o', 'r-*'])
plt.show()

The operation results are as follows:

After standardizing the zero mean value of the data, it is brought into the three-layer neural network prediction model established by the local fiscal revenue (6 nodes in the input layer, 12 nodes in the hidden layer and 1 node in the output layer), and the predicted value of a city's fiscal revenue in 2015 is 253.315 billion yuan.

The above figure shows the comparison between the real value and the predicted value of local fiscal revenue. It can be seen from the figure that the overall trend of the real value and the predicted value is the same, but there are still errors.

VAT forecasting model

The grey prediction model is established for the factors affecting value-added tax, and the posterior error ratio and prediction accuracy grade are obtained. Among them, the variables of the influencing factors of value-added tax include: total import value of goods (x1), industrial added value (x3) and the proportion of industrial added value in GDP (x5).

Create a new python file named: 2-huise py

The specific code of the VAT forecast model is as follows, which is added to 2-huise Py file:

import numpy as np
import pandas as pd

inputfile = 'finance/data2.csv'
outputfile = 'finance/data2_GM11.xls'
data = pd.read_csv(inputfile)
data.index = range(1999, 2014)
data.loc[2014] = None
data.loc[2015] = None


def gm(x0):  # Custom grey prediction function
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)  # Calculation parameters
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


l = ['x1', 'x3', 'x5']
for i in l:
	f = gm(data[i][:len(data) - 2].as_matrix())[0]
	data[i][2014] = f(len(data) - 1)
	data[i][2015] = f(len(data))
	data[i] = data[i].round(6)
data[l + ['y']].to_excel(outputfile)

The neural network prediction model (3 nodes in the input layer, 6 nodes in the hidden layer and 1 node in the output layer) is established for the factors affecting the value-added tax. The learning times are 10000 times and the error accuracy is high 1 0 − 7 10^{-7} 10−7.

Create a new python file named: 2-yuce py

The specific code of the prediction model is as follows, which is added to 2-yuce Py file:

import pandas as pd

inputfile = 'finance/data2_GM11.xls'
outputfile = 'finance/VAT.xls'
modelfile = 'finance/2-net.model'
data = pd.read_excel(inputfile)
feature = ['x1', 'x3', 'x5']

data_train = data.loc[range(1999, 2014)].copy()
data_mean = data_train.mean()
data_std = data_train.std()
data_train = (data_train - data_mean) / data_std
x_train = data_train[feature].as_matrix()
y_train = data_train['y'].as_matrix()
from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(6, input_shape=(3,)))
model.add(Activation('relu'))
model.add(Dense(1, input_shape=(6,)))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, nb_epoch=10000, batch_size=16)
model.save_weights(modelfile)
x = ((data[feature] - data_mean[feature]) / data_std[feature]).as_matrix()
data['y_pred'] = model.predict(x) * data_std['y'] + data_mean['y']
data.to_excel(outputfile)
print(data)
import matplotlib.pyplot as plt

p = data[['y', 'y_pred']].plot(subplots=True, style=['b-o', 'r-*'])
plt.show()

After standardizing the zero mean value of the data and substituting it into the three-layer neural network prediction model established by value-added tax, the predicted value of value-added tax in 2015 is 2685.403 million yuan. See the figure below for relevant data.

The comparison between the real value of VAT and the predicted value is shown below

Business tax forecasting model

Impact on business The grey prediction model is established based on the factors of tax, and the posterior error ratio and prediction accuracy grade are obtained. Among them, the variables affecting the value-added tax include: the total social investment in fixed assets (x3), the urban retail price index (x4), the loss of state-owned and state-controlled industries above Designated Size (x6) and the proportion of total profits of construction enterprises (x8).

Create a new python file named: 3-huise py

The specific code of business tax grey prediction model is as follows, which is added to 3-huise Py file

import numpy as np
import pandas as pd

inputfile = 'finance/data3.csv'
outputfile = 'finance/data3_GM11.xls'
data = pd.read_csv(inputfile)
data.index = range(1999, 2014)
data.loc[2014] = None
data.loc[2015] = None


def gm(x0):
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


l = ['x3', 'x4', 'x6', 'x8']
for i in l:
	f = gm(data[i][:len(data) - 2].as_matrix())[0]
	data[i][2014] = f(len(data) - 1)
	data[i][2015] = f(len(data))
	data[i] = data[i].round()
data[l + ['y']].to_excel(outputfile)
print(data[l + ['y']])

The neural network prediction model (4 nodes in the input layer, 8 nodes in the hidden layer and 1 node in the output layer) is established for the factors affecting the business tax. The learning times are 10000 times and the error accuracy is high 1 0 − 7 10^{-7} 10−7.

Create a new python file named: 3-yuce py

The specific code of business tax neural network prediction model is as follows, which is added to 3-yuce Py file

import pandas as pd

inputfile = 'finance/data3_GM11.xls'
outputfile = 'finance/sales_tax.xls'
modelfile = 'finance/3-net.model'
data = pd.read_excel(inputfile)
feature = ['x3', 'x4', 'x6', 'x8']
data_train = data.loc[range(1999, 2014)].copy()
data_mean = data_train.mean()
data_std = data_train.std()
data_train = (data_train - data_mean) / data_std
x_train = data_train[feature].as_matrix()
y_train = data_train['y'].as_matrix()
from keras.models import Sequential
from keras.layers.core import Dense, Activation

net = Sequential()
net.add(Dense(8, input_shape=(4,)))
net.add(Activation('relu'))
net.add(Dense(1, input_shape=(8,)))
net.compile(loss='mean_squared_error', optimizer='adam')
net.fit(x_train, y_train, nb_epoch=10000, batch_size=16)
net.save_weights(modelfile)
x = ((data[feature] - data_mean[feature]) / data_std[feature]).as_matrix()
data['y_pred'] = net.predict(x) * data_std['y'] + data_mean['y']
data.to_excel(outputfile)
print(data)
import matplotlib.pyplot as plt

data[['y', 'y_pred']].plot(subplots=True, style=['b-o', 'r-*'])
plt.show()

The operation results are as follows:

After standardizing the zero mean value of the data and substituting it into the three-layer neural network prediction model established by the business tax (4 nodes in the input layer, 8 nodes in the hidden layer and 1 node in the output layer), the predicted value of the business tax in 2015 is 22640.86 million yuan. The relevant data are shown in the figure.

The above figure shows the comparison between the real value of business tax and the predicted value. It can be seen that the overall trend of the real value and the predicted value is the same, but there is an error between the real value and the predicted value.

Prediction model of enterprise income tax

A grey prediction model is established for the factors affecting enterprise income tax

The variables affecting the enterprise income tax (y) include the added value of the secondary industry (x1), the added value of the tertiary industry (x2), the total investment in fixed assets (x3), the retail price index of urban commodities (1978 = 100) (x4), the loss of state-owned and state-controlled Industrial Enterprises above Designated Size (x6), the total output value of the construction industry (x7), the retail sales of chain stores above Designated Size (x9) and the total local fiscal revenue (x10)

Create a new python file named: 4-huise py

The specific code of the grey prediction model of enterprise income tax is as follows, which is added to 4-huise In PY

import numpy as np
import pandas as pd

inputfile = 'finance/data4.csv'
outputfile = 'finance/data4_GM11.xls'
data = pd.read_csv(inputfile)
data.index = range(2002, 2014)
data.loc[2014] = None
data.loc[2015] = None


def gm(x0):
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


l = ['x1', 'x2', 'x3', 'x4', 'x6', 'x7', 'x9', 'x10']
for i in l:
	f = gm(data[i][:len(data) - 2].as_matrix())[0]
	data[i][2014] = f(len(data) - 1)
	data[i][2015] = f(len(data))
	data[i] = data[i].round(2)
data[l + ['y']].to_excel(outputfile)
print(data[l + ['y']])

Create a new python file named: 4-yuce py

At 4-yuce Write code in py to establish a neural network model of the factors affecting enterprise income tax

import pandas as pd
import numpy as np

inputfile = 'finance/data4_GM11.xls'
outputfile = 'finance/enterprise_income.xls'
modelfile = 'finance/4-net.model'
data = pd.read_excel(inputfile)
feature = ['x1', 'x2', 'x3', 'x4', 'x6', 'x7', 'x9', 'x10']
data_train = data.loc[range(2002, 2014)].copy()
data_mean = data_train.mean()
data_std = data_train.std()
data_train = (data_train - data_mean) / data_std
x_train = data_train[feature].as_matrix()
y_train = data_train['y'].as_matrix()
from keras.models import Sequential
from keras.layers.core import Dense, Activation

net = Sequential()
net.add(Dense(6, input_shape=(8,)))
net.add(Activation('relu'))
net.add(Dense(1, input_shape=(6,)))
net.compile(loss='mean_squared_error', optimizer='adam')
net.fit(x_train, y_train, nb_epoch=10000, batch_size=16)
net.save_weights(modelfile)
x = ((data[feature] - data_mean[feature]) / data_std[feature]).as_matrix()
data['y_pred'] = net.predict(x) * data_std['y'] + data_mean['y']
data.to_excel(outputfile)
print(data)
import matplotlib.pyplot as plt

data[['y', 'y_pred']].plot(subplots=True, style=['b-o', 'r-*'])
plt.show()

The operation results are as follows: the following are the historical data and predicted values of enterprise income tax and its related factors

The zero mean value of the data is standardized or substituted into the three-layer neural network prediction model established by the enterprise income tax (8 nodes in the input layer, 6 nodes in the hidden layer and 1 node in the output layer), and the predicted value of the enterprise income tax in 2015 is 17819.2 million yuan. The relevant data are shown in the figure above.

The comparison between the real value and the predicted value of enterprise income tax is shown below

There is a comparison chart between the real value and the predicted value. We can conclude that the overall trend of the real value and the predicted value is the same.

Individual income tax prediction model

Create a new python file named: 5-huise py

At 5-huise Py file, which is used to establish a grey prediction model for the influencing factors of individual income tax

import numpy as np
import pandas as pd

inputfile = 'finance/data5.csv'
outputfile = 'finance/data5_GM11.xls'
data = pd.read_csv(inputfile)
data.index = range(2000, 2014)

data.loc[2014] = None
data.loc[2015] = None
l = ['x1', 'x4', 'x5', 'x7']


def gm(x0):  # Custom grey prediction function
	import numpy as np
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)  # Calculation parameters
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


for i in l:
	f = gm(data[i][:len(data) - 2].as_matrix())[0]
	data[i][2014] = f(len(data) - 1)
	data[i][2015] = f(len(data))
	data[i] = data[i].round()

data[l + ['y']].to_excel(outputfile)

Create a new python file named: 5-yuce py

At 5-yuce The code written in py file is used to establish a neural network prediction model for the influencing factors affecting individual income tax, and its parameters are set to error accuracy 1 0 − 7 10^{-7} 10 − 7, 15000 learning times, 4 nodes of input layer, 8 nodes of hidden layer and 1 node of output layer.

The specific codes are as follows:

import pandas as pd
import numpy as np

inputfile = 'finance/data5_GM11.xls'
outputfile = 'finance/personal_Income.xls'
modelfile = 'finance/5-net.model'
data = pd.read_excel(inputfile)
feature = ['x1', 'x4', 'x5', 'x7']
data_train = data.loc[range(2000, 2014)].copy()
data_mean = data_train.mean()
data_std = data_train.std()
data_train = (data_train - data_mean) / data_std
x_train = data_train[feature].as_matrix()
y_train = data_train['y'].as_matrix()
from keras.models import Sequential
from keras.layers.core import Dense, Activation

net = Sequential()
net.add(Dense(8, input_shape=(4,)))
net.add(Activation('relu'))
net.add(Dense(1, input_shape=(8,)))
net.compile(loss='mean_squared_error', optimizer='adam')
net.fit(x_train, y_train, nb_epoch=15000, batch_size=16)
net.save_weights(modelfile)
x = ((data[feature] - data_mean[feature]) / data_std[feature]).as_matrix()
data['y_pred'] = net.predict(x) * data_std['y'] + data_mean['y']
data.to_excel(outputfile)
print(data)
import matplotlib.pyplot as plt

data[['y', 'y_pred']].plot(subplots=True, style=['b-o', 'r-*'])
plt.show()

The operation results are as follows:

The historical data of individual income tax and its related factors are predicted as follows:

After standardizing the zero mean value of the data and substituting it into the three-layer neural network prediction model established by the individual income tax (4 nodes in the input layer, 8 nodes in the hidden layer and 1 node in the output layer), the 2015 prediction value of the individual income tax is 648785.25. See the above figure for the relevant data.

It can be seen from the above figure that the overall trend of the real value of personal income tax is the same as that of the predicted value, and the error is relatively small.

Revenue forecast model of government funds

Compared with previous years in 2006, the city's land transfer fee rose sharply in 2007, and the sharp rise in land transfer fee income directly affected the income of government funds. Therefore, in order to ensure the continuity of data, the grey prediction model is used to predict the income of government funds from 2007 to 2013

Create a new python file named: 6-yuce py

The specific code of the revenue prediction model of government funds is as follows, and it is added to 6-yuce Py file.

from __future__ import print_function
import numpy as np
import pandas as pd


def gm(x0):  # Custom grey prediction function
	x1 = x0.cumsum()  # 1-AGO sequence
	z1 = (x1[:len(x1) - 1] + x1[1:]) / 2.0  # Nearest neighbor MEAN (MEAN) generation sequence
	z1 = z1.reshape((len(z1), 1))
	B = np.append(-z1, np.ones_like(z1), axis=1)
	Yn = x0[1:].reshape((len(x0) - 1, 1))
	[[a], [b]] = np.dot(np.dot(np.linalg.inv(np.dot(B.T, B)), B.T), Yn)  # Calculation parameters
	f = lambda k: (x0[0] - b / a) * np.exp(-a * (k - 1)) - (x0[0] - b / a) * np.exp(-a * (k - 2))  # Restore value
	delta = np.abs(x0 - np.array([f(i) for i in range(1, len(x0) + 1)]))
	C = delta.std() / x0.std()
	P = 1.0 * (np.abs(delta - delta.mean()) < 0.6745 * x0.std()).sum() / len(x0)
	return f, a, b, x0[0], C, P


x0 = np.array([3152063, 2213050, 4050122, 5265142, 5556619, 4772843, 9463330])
f, a, b, x00, C, P = gm(x0)
print('2014 The forecast results in and 2015 are as follows:\n%0.2f Wan Yuanhe%0.2f Ten thousand yuan' % (f(8), f(9)))
print('The posterior error ratio is:%0.4f' % C)
p = pd.DataFrame(x0, columns=['y'], index=range(2007, 2014))
p.loc[2014] = None
p.loc[2015] = None
p['y_pred'] = [f(i) for i in range(1, 10)]
p.index = pd.to_datetime(p.index, format='%Y')
import matplotlib.pyplot as plt

p.plot(style=['b-o', 'r-*'], xticks=p.index)
plt.show()

The operation results are as follows:

The predicted results of 2014 and 2015 are 103870025600 yuan and 129297950700 yuan respectively.

The posterior error ratio of grey prediction is 0.2390, less than 0.35, and the prediction accuracy is good.

The comparison diagram between the real value and predicted value of grey prediction government funds is shown in the figure below

It can be seen from the above figure that the overall trend of the comparison between the real value of grey prediction government funds and the predicted value is the same, but due to the small amount of data, the fitting of the model is not very perfect and there are still great errors.

Tags: Python Machine Learning AI Data Analysis Deep Learning

Posted by Alecdude on Fri, 13 May 2022 00:36:23 +0300