Python crawler actual combat, requests+pyecharts module, Python realizes data visualization of new crown epidemic (with source code)

foreword

What I will introduce to you today is Python to crawl the data of the new crown epidemic and realize data visualization. Here, I will give the code to the friends who need it, and give some tips.

First of all, before crawling, we should pretend to be a browser as much as possible without being recognized as a crawler. The basic thing is to add a request header, but there will be many people crawling such plain text data, so we need to consider changing the proxy IP and random replacement The method of request header is used to crawl the recruitment website data.

Before writing crawler code every time, our first and most important step is to analyze our web pages.

Through analysis, we found that the speed of crawling is relatively slow during the crawling process, so we can also improve the crawling speed of crawlers by disabling Google Chrome images, JavaScript, etc.

development tools

Python version: 3.8

Related modules:

requests module

lxml module

openpyxl module

pandas module

pyecharts module

Environment build

Install Python and add it to the environment variable, and pip installs the required related modules.

Idea analysis

Open the page we want to crawl in the browser
Press F12 to enter the developer tool and see where the epidemic data we want is
Here we need page data

Code

Epidemic crawler.py

import requests
from lxml import etree
import json
import openpyxl

#Universal crawler
url = 'https://voice.baidu.com/act/newpneumonia/newpneumonia'
headers = {
    "User-Agent": "Change to your own browser"
 }
response = requests.get(url=url,headers=headers).text
 #Use tree form when using xpath
html = etree.HTML(response)
 #Use xpath to get the json data of the page we found before and print it out
json_text = html.xpath('//script[@type="application/json"]/text()')
json_text = json_text[0]
print(json_text)


#Use python's native library to convert json data
result = json.loads(json_text)
print(result)
#By printing out the converted object, we can see that the data we want must be under the value corresponding to the component, so now we take out the value
result = result["component"]
#Print again to see the result
print(result)
#Obtain current domestic data
result = result[0]['caseList']
print(result)


#create workbook
wb = openpyxl.Workbook()
#create worksheet
ws = wb.active
#Set the title of the table
ws.title = "Domestic epidemic"
#write header
ws.append(["province","cumulative confirmed cases","die","cure"])
#Get the data of each province and write
for line in result:
     line_name = [line["area"],line["confirmed"],line["died"],line["crued"]]
     for ele in line_name:
         if ele == '':
             ele = 0
     ws.append(line_name)
 #save to excel
wb.save('./china.xls')

How to get User-Agent

Problems encountered Excel xlsx file; not supported solution

Reason: versions after xlrd1.2.0 do not support xlsx format, but support xls format

Method one:

Uninstall the new version pip uninstall xlrd

Install the old version: pip install xlrd=1.2.0 (or earlier)

Method Two:

Change the format of the excel version used by xlrd to xls (for insurance, save it as xls format)

Epidemic data results display

Visualization.py

 #visualization part
import pandas  as pd
from pyecharts.charts import Map,Page
from pyecharts import options as opts

#set column alignment
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
#open a file
df = pd.read_excel('china.xls')
#Statistics for provinces
data2 = df['province']
data2_list = list(data2)
data3 = df['cumulative confirmed cases']
data3_list = list(data3)
data4 = df['die']
data4_list = list(data4)
data5 = df ['cure']
data5_list = list(data5)

c = (
    Map()
       .add("cure", [list(z) for z in zip(data2_list, data5_list)], "china")
      .set_global_opts(
         title_opts=opts.TitleOpts(),
         visualmap_opts=opts.VisualMapOpts(max_=200),
     )
)
c.render()

Cumulative = (
     Map()
     .add("cumulative confirmed cases", [list(z) for z in zip(data2_list, data3_list)], "china")
        .set_global_opts(
       title_opts=opts.TitleOpts(),
        visualmap_opts=opts.VisualMapOpts(max_=200),
    )
)
 
death = (
    Map()
      .add("die", [list(z) for z in zip(data2_list, data4_list)], "china")
     .set_global_opts(
        title_opts=opts.TitleOpts(),
        visualmap_opts=opts.VisualMapOpts(max_=200),
     )
)
 
cure = (
     Map()
        .add("cure", [list(z) for z in zip(data2_list, data5_list)], "china")
         .set_global_opts(
      title_opts=opts.TitleOpts(),
      visualmap_opts=opts.VisualMapOpts(max_=200),
    )
)
 
page = Page(layout=Page.DraggablePageLayout)
page.add(
          Cumulative,
          death,
          cure,
)
#Mr. generate render.html file
page.render()

Epidemic data data visualization

at last

In order to thank the readers, I would like to share with you some of my recent favorite programming dry goods, to give back to every reader, and hope to help you.

There are practical Python tutorials suitable for beginners~

Come and grow up with Xiaoyu!

① More than 100+ PythonPDF s (mainstream and classic books should be available)

② Python standard library (the most complete Chinese version)

③ Source code of reptile projects (forty or fifty interesting and classic hand-practicing projects and source codes)

④ Videos on basics of Python, crawlers, web development, and big data analysis (suitable for beginners)

⑤ Python learning roadmap (farewell to unrefined learning)

Tags: Python crawler data visualization programming language

Posted by Steffen on Sat, 10 Dec 2022 21:11:12 +0300