2003031118 - Li Wei - Python data analysis May 1 holiday homework - MySQL installation and use

        project   midterm exam paper
Course Class Blog Link Level 20 Data Class (this)
This homework request link Work requirements
Blog name 2003031118 - Li Wei - Python data analysis May 1 holiday homework - MySQL installation and use
Require Each question must have a title, code (use the insert code, you can't insert the code to check the information to solve it, don't directly screenshot the code!!), screenshot (only the running result).

"Python Data Analysis" course mid-term computer exam questions

1. Analyze the relationship between the characteristics of population data from 1996 to 2015 (50 points for 1 question, 50 points in total)

Test knowledge points: master the adjustment methods of drawing parameters commonly used in pyplot; master the drawing methods of sub-graphs; master the preservation and display methods of drawing graphs; master the functions and drawing methods of scatter graphs and line graphs.

Statement of needs:

The population data has a total of 6 characteristics, namely total population at the end of the year, male population, female population, urban population, rural population and year. Looking at the changes of each feature over time can analyze the future direction of the male-to-female population ratio and urban and rural population changes.

Screenshot below:

 

 

 

 

Require:

(1) Read population data using the NumPy library.

(2) Create a canvas and add sub-images.

(3) Draw a scatter graph and a line graph on the two subgraphs respectively.

(4) Save and display the picture.

(5) Analyze the trend of population change in the future

 

 1 import numpy as np
 2 import matplotlib.pyplot as plt
 3 #use numpy Library to read population data
 4 data=np.load('D:/desktop/python midterm exam/populations.npz',allow_pickle=True)
 5 print(data.files)#view array in file
 6 print(data['data'])
 7 print(data['feature_names'])
 8 plt.rcParams['font.sans-serif'] = 'SimHei' # Set Chinese display
 9 plt.rcParams['axes.unicode_minus'] = False# Prevent characters from being displayed
10 name=data['feature_names']#extract the feature_names array, treated as labels for data
11 values=data['data']#extract the data an array, considered where the data exists
12 p1=plt.figure(figsize=(12,12))#Determine the canvas size
13 pip1=p1.add_subplot(2,1,1)#Create a subgraph with two rows and one column and start drawing
14 #Plot a scatter plot on a subplot
15 plt.scatter(values[0:20,0],values[0:20,1])#,marker='8',color='red'
16 plt.ylabel('Total population (10,000 people)')
17 plt.legend('year-end')
18 plt.title('1996~2015 Year-end and various population scatter plots')
19 pip2=p1.add_subplot(2,1,2)#draw subgraph 2
20 plt.scatter(values[0:20,0],values[0:20,2])#,marker='o',color='yellow'
21 plt.scatter(values[0:20,0],values[0:20,3])#,marker='D',color='green'
22 plt.scatter(values[0:20,0],values[0:20,4])#,marker='p',color='blue'
23 plt.scatter(values[0:20,0],values[0:20,5])#,marker='s',color='purple'
24 plt.xlabel('time')
25 plt.ylabel('Total population (10,000 people)')
26 plt.xticks(values[0:20,0])
27 plt.legend(['male','female','town','rural'])
28 #Draw a line chart on a subgraph
29 p2=plt.figure(figsize=(12,12))
30 p1=p2.add_subplot(2,1,1)
31 plt.plot(values[0:20,0],values[0:20,1])#,linestyle = '-',color='r',marker='8'
32 plt.ylabel('Total population (10,000 people)')
33 plt.xticks(range(0,20,1),values[range(0,20,1),0],rotation=45)#rotation Set the inclination
34 plt.legend('year-end')
35 plt.title('1996~2015 Year-end total and various types of population line chart')
36 p2=p2.add_subplot(2,1,2)
37 plt.plot(values[0:20,0],values[0:20,2])#,'y-'
38 plt.plot(values[0:20,0],values[0:20,3])#,'g-.'
39 plt.plot(values[0:20,0],values[0:20,4])#,'b-'
40 plt.plot(values[0:20,0],values[0:20,5])#,'p-'
41 plt.xlabel('time')
42 plt.ylabel('Total population (10,000 people)')
43 plt.xticks(values[0:20,0])
44 plt.legend(['male','female','town','rural'])
45 #Show pictures
46 plt.show()

 

run screenshot

 

 

2. Read and view the basic information of the main table of P2P network loan data (1 question 10 points, 10 points in total)

Test knowledge points: master common data reading methods; master common attributes and methods of DataFrame; master basic time data processing methods; master the principles and methods of grouping and aggregation; master the production of pivot tables and cross tables.

Statement of needs:

The P2P loan master table data mainly stores the basic information of online loan users. Exploring the basic information of data can gain insight into the overall distribution of data, the generic relationship of data, and discover the relationship between data.

Require:

(1) Use the ndim, shape, and memory_usage attributes to view the dimension, size, and memory occupied information respectively.

 

1 import os
2 import pandas as pd
3 master = pd.read_csv('D:/desktop/python midterm exam/Training_Master.csv',encoding='gbk')
4 print('P2P The dimensions of the online loan master table data are:',master.ndim)
5 print('P2P The shape and size of the online loan master table data is:',master.shape)
6 print('P2P The memory occupied by the online loan master table data is:',master.memory_usage)
7 #code 16-2
8 print('P2P The descriptive statistics of the data in the main table of online loans are:\n',master.describe())

run screenshot

 

 

 

3. Extract the time information of the user information update table and the login information table (10 points for 1 question, 10 points in total)

Test knowledge points: master common data reading methods; master common attributes and methods of DataFrame; master basic time data processing methods; master the principles and methods of grouping and aggregation; master the production of pivot tables and cross tables.

Statement of needs:

There is a large amount of time data in the user information update table and the login information table summary. Extracting the information in the time data can deepen the understanding of the data on the one hand, and explore the degree of correlation between this part of the information and the target on the other hand. At the same time, the time difference information of user login time, loan transaction time, and user information update time can reflect the behavior information of different users of P2P network loans.

Require:

(1) Use the to_datetime function to convert the time strings of the user information update table and the login information table.

 

 1 import pandas as pd
 2 LogInfo  = pd.read_csv('D:/desktop/python midterm exam/Training_LogInfo.csv',encoding='gbk')
 3 Userupdate   = pd.read_csv('D:/desktop/python midterm exam//Training_Userupdate.csv',encoding='gbk')
 4 # Convert time string
 5 LogInfo['Listinginfo1']=pd.to_datetime(LogInfo['Listinginfo1'])
 6 LogInfo['LogInfo3']=pd.to_datetime(LogInfo['LogInfo3'])
 7 print('Convert the first 5 lines of the time string of the login information table:\n',LogInfo.head())
 8 Userupdate['ListingInfo1']=pd.to_datetime(Userupdate['ListingInfo1'])
 9 Userupdate['UserupdateInfo2']=pd.to_datetime(Userupdate['UserupdateInfo2'])
10 print('Convert the first 5 rows of the time string of the user information update table:\n',Userupdate.head())

run screenshot

 

 

4. Use the grouping aggregation method to further analyze the user information update table and login information table (30 points for 1 question, 30 points in total)

Test knowledge points: master common data reading methods; master common attributes and methods of DataFrame; master basic time data processing methods; master the principles and methods of grouping and aggregation; master the production of pivot tables and cross tables.

Statement of needs:

When analyzing the user information update table and the login information table, in addition to extracting the information of the time itself, you can also perform group aggregation in combination with the user number, and then perform intra-group analysis. The earliest and latest information update time, the earliest and latest login time, the number of times of information update, the number of times of login and other information in each group can be obtained through intra-group analysis.

Require:

(1) Use the groupby method to group the user information update table and the login information table.

(2) Use the agg method to obtain the earliest and latest update and login time after grouping.

(3) Use the size method to obtain the information update times and login times of the grouped data.

 1 import pandas as pd
 2 import numpy as np
 3 LogInfo  = pd.read_csv('D:/desktop/python midterm exam/Training_LogInfo.csv',encoding='gbk')
 4 Userupdate   = pd.read_csv('D:/desktop/python midterm exam/Training_Userupdate.csv',encoding='gbk')
 5 # use groupby The method groups the user information update table and the login information table
 6 LogGroup = LogInfo[['Idx','LogInfo3']].groupby(by = 'Idx')
 7 UserGroup = Userupdate[['Idx','UserupdateInfo2']].groupby(by = 'Idx')
 8 #code 18-2
 9 # use agg The method obtains the earliest, latest, and updated login time after grouping
10 print('The earliest login time after grouping is:\n',LogGroup.agg(np.min))
11 print('The latest login time after grouping is:\n',LogGroup.agg(np.max))
12 print('The earliest update time after grouping is:\n',UserGroup.agg(np.min))
13 print('The latest update time after grouping is:\n',UserGroup.agg(np.max))
14 #code 18-3
15 # use size The method obtains the information update times and login times of the grouped data
16 print('The information update times of the grouped data are:\n',LogGroup.size())
17 print('The number of registrations of the grouped data is:\n',UserGroup.size())

run screenshot

 

 

 

 

 

Posted by getmukesh on Mon, 02 May 2022 13:35:37 +0300