This article is from the graduation work of vincentbao, a student of Lou + who used the method of data analysis to show us a complete picture of the world under the COVID-19.
At the beginning of 2020, an epidemic of COVID-19 was raging all over the world, and COVID-19 was named 2019 ncov by the World Health Organization.
The outbreak of the COVID-19 has swept all over the world. Up to now, there are still many countries with confirmed cases and the number of deaths rising. As one of the world's developed countries, the United States has performed particularly badly in the face of this disaster, becoming the country with the largest number of confirmed cases and deaths.
Many people believe that the reason why western countries are slow to respond to the epidemic is that people are indifferent to the risk of the epidemic and their risk assessment and expectation are not in place. However, in fact, the rampant COVID-19 has not only threatened the safety of human life, but also caused a huge impact on the economy. The global financial market has experienced great turmoil, and the U.S. stock index has even experienced three unprecedented circuit breakers. The economies of all countries have been hit to varying degrees.
In addition, the epidemic has a far-reaching impact on the mainstream public opinion, and combating the epidemic has become the mainstream public opinion field of the society. This epidemic has brought too many lessons and Enlightenment to human life and production. It is also a rare risk warning and education lesson.
Therefore, from the perspective of data analysis and mining, I will show the impact of the epidemic on society, public opinion, economy and other different aspects. Through the display of data, we can better alert people to pay attention to the epidemic. The epidemic is not over yet, and mass prevention and control still need to continue.
The project will analyze the impact of the current COVID-19 on the society, mainly from the following three parts:
1, Visualize the current situation of the world epidemic to show the severity of the world epidemic.
2, Analyze the impact of the epidemic on news and public opinion.
3, Through domestic economic data to show the impact of the epidemic on the economy and society.
data acquisition
The first is the data collection part. The first thing to collect is the world epidemic situation under the current date. We capture the real-time dynamic page of the epidemic situation in dingxiangyuan to obtain the confirmed number, cured number and death number of each country, and store them in the DataFrame.
html='https://3g.dxy.cn/newh5/view/pneumonia?scene=2&clicktime=1579582238&enterid=1579582238&from=timeline&isappinstalled=0' html_data=requests.get(html) html_data.encoding = 'utf-8' html_data=etree.HTML(html_data.text,etree.HTMLParser()) html_data=html_data.xpath('//*[@ id = "getlistbycountrytypeservice2true"] / text() ') #xpath method selects the data collection of epidemic situation ncov_world=html_data[0][49:-12] ncov_world=ncov_world.replace('true', 'True') ncov_world=ncov_world.replace('false', 'False') ncov_world=eval(ncov_world) country=[] confirmed=[] lived=[] dead=[] for i in ncov_world: #Separate the name of the country, the number of confirmed cases, the number of cured cases and the number of deaths, and store them in the dataframe for backup country.append(i['provinceName']) confirmed.append(i['confirmedCount']) lived.append(i['curedCount']) dead.append(i['deadCount']) data_world=pd.DataFrame() data_world['Country name']=country data_world['Number of confirmed cases']=confirmed data_world['Number of cured']=lived data_world['death toll']=dead data_world.head(5)
Next, collect domestic economic data. Economic changes are usually reflected by GDP. The National Bureau of statistics provides detailed data for query. It can not only query GDP, but also query the accumulated value and added value of data in various industries. At the same time, it also supports monthly, quarterly and annual queries. Since GDP is usually synchronized on a quarterly basis, the economic data of various industries over the past 18 quarters are collected here for backup.
Data preprocessing
The following is a simple process for several data sets.
During the collection of economic and industrial data and worldwide epidemic data, the corresponding sorting has been carried out, the parts that need to be used have been retained, and the labels have been modified, and there is no null value, so there is no need for further processing. The time warehouse data of historical epidemic information needs to be processed. Only the total epidemic data of each country is retained, the separate data is removed, and the time series index is established to query whether there is an empty set. The epidemic news part only retains the news date, news name and news content.
data_area=data_area.loc[data_area['countryName']==data_area['provinceName']] data_area_times=data_area[['countryName','province_confirmedCount','province_curedCount','province_deadCount','updateTime']] time=pd.DatetimeIndex(data_area_times['updateTime']) #The time series is generated according to the update time of the epidemic situation data_area_times.index=time #Generate index data_area_times=data_area_times.drop('updateTime',axis=1) data_area_times.head(5) data_area_times.isnull().any() #Query whether there is a null value
Global epidemic visualization map
The following shows the ten countries with the largest number of confirmed cases in the world for data visualization. Through the bar chart, we can clearly see the number of confirmed cases, cured cases and deaths in these countries.
data_world=data_world.sort_values(by='Number of confirmed cases',ascending=False) #Sort by number of confirmed patients data_world_set=data_world[['Number of confirmed cases','Number of cured','death toll']] data_world_set.index=data_world['Country name'] data_world_set.head(10).plot(kind='bar',figsize=(15,10)) #Mapping the top ten countries plt.xlabel('Country name',fontproperties=myfont) plt.xticks(fontproperties=myfont) plt.legend(fontsize=30,prop=myfont) #Set legend
It is not difficult to see that the number of confirmed cases in the United States is still high, with a total of more than 5 million, followed by Brazil and India.
During the epidemic period, the major analysis websites have given the map visualization of the epidemic situation. Through the map visualization of the epidemic situation, we can more clearly understand the distribution of the epidemic situation in the world. By clicking the map, we can obtain the epidemic information everywhere, making the information visualization more intelligent. We can also realize this function through the pyecharts module of python.
The different colors from red to blue represent the difference of severity. The cumulative number of confirmed cases in each country can be obtained by clicking the mouse. It is not difficult to see that the epidemic is spreading all over the world, and the number of confirmed cases in most countries is very high.
Follow up:
- Epidemic growth
- News analysis of COVID-19
- Impact of the epidemic on various industries
- summary
Due to limited space, the experimental report is not fully displayed here. Click here for the full report.
The laboratory building has organized the experimental report into a new course, which can be learned for free. Click here Visual analysis of COVID-19 data , come and make an experiment report in person.