Python crawls m3u8 video (multithreading)

Excerpt from: https://www.cnblogs.com/python147/p/14511627.html

1 Introduction

The text and pictures in this article come from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us in time for resolution.

PS: If you need Python learning materials, you can click the link below to get them yourself

Python free learning materials, codes and communication answers Click to join

 

When crawling videos, I found that the current videos are all encrypted (m3u8), no longer mp4 or avi links are directly displayed on the webpage, and they are all encrypted to form ts files for segmented playback.

Today I will teach you how to download m3u8 encrypted video through python crawling.

2. Analyzing web pages

1. Movie video source

http://www.caisetv.com/

2. Analyze the m3u8 encrypted directory

http://www.caisetv.com/dongzuopian/chaidanzhuanjia/0-1.html

 

On the page where the video is played, you can check the network data packets by pressing F12

https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/index.m3u8

 

The ts here is the encrypted segmented video of the movie

https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/

After the above m3u8 is linked to index.m3u8, the ts names such as 075a34cccdd000000.ts are added to the segmented video link

As follows:

https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/075a34cccdd000000.ts

Download this segmented video through a browser and open it:

 

 

So just download all the ts and merge it is the complete movie video! ! !

3. Download ts

1. Download ts segmented video

Just downloaded all the names of ts

 

 

Next, read this file through python code, extract the name, splicing the link, download it and save it in a folder!

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',}

###download ts file
def download(url,name):
    r = requests.get(url, headers=headers)
    with open(name+"", "wb") as code:
        code.write(r.content)

with open("index.m3u8","r") as f:
    ts_list = f.readlines()

#Remove the previous useless information
ts_list = ts_list[5:]
urlheader="https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/"
count = 0
for i in ts_list:
    if "#" not in i:
        i = i.replace("\n","")
        download(urlheader+""+i,"cdzj2/"+str(count)+".ts")
        count = count+1
        print(count)

 

In this way, all the ts files can be downloaded, but downloading one by one is very slow. Next, download through multiple threads to increase the download speed! ! !

2. Multi-thread download ts video

for i in ts_list:
    if "#" not in i:
        i = i.replace("\n","")
        n = i[-7:]
        threading.Thread(target=download, args=(urlheader+""+i,"cdzj2/"+str(n),)).start()
        #download(urlheader+""+i,"cdzj2/"+str(count)+".ts")

 

These ts files can be downloaded locally through multi-threading soon! ! !

4. Merge ts

cmd merge files

copy /b   *.ts   new.mp4

Through this command (run in the cmd terminal), you can merge the ts files in the folder containing the ts files (arrange and merge them in order of names) and save them as new.mp4

 

5. Summary

1. Analyze m3u8 encrypted files
2.python download ts file
3.cmd merges ts and saves it in mp4 format

6. My code (full)

 1 # -*- coding:utf-8 -*-
 2 import os
 3 import requests
 4 import shutil
 5 import time
 6 import threading
 7 
 8 
 9 def download(url, name):
10     """download ts document"""
11 
12     headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0', }
13     r = requests.get(url, headers=headers)
14     with open(name + "", "wb") as code:
15         code.write(r.content)
16 
17 
18 def m3u8_download_multi_thread(m3u8, mp4, thread_max=10):
19     ts_in = mp4 + os.sep + "ts"
20     if not os.path.exists(ts_in):
21         os.makedirs(ts_in)  # mkdir -p ./a/b/c
22 
23     urlheader = os.path.dirname(m3u8)
24     m3u8_local = mp4 + "/index.m3u8"
25 
26     download(m3u8, m3u8_local)
27 
28     with open(m3u8_local, "r") as f:
29         ts_list = f.readlines()
30 
31     # Multithreading
32     line = 0
33     count = 0
34     for i in ts_list:
35         if count >= 10:
36             break
37         line = line + 1
38         while thread_max <= len(threading.enumerate()):
39             time.sleep(1)
40         if "#" not in i:
41             count = count + 1
42             i = i.replace("\n", "")
43             print("line[%d]=%s" % (line, i))
44             n = os.path.basename(i)
45             try:
46                 threading.Thread(target=download, args=(urlheader + "/" + i, ts_in + "/" + str(n),)).start()
47                 # download(urlheader+""+i,"cdzj2/"+str(count)+".ts")
48             except Exception as e:
49                 print("threading.cnt=%d" % len(threading.enumerate()))
50 
51     # Wait for all child threads to end
52     while 1 < len(threading.enumerate()):
53         time.sleep(1)
54         # print("thread_count=%d" % len(threading.enumerate()))
55 
56 
57 def m3u8_download_single_thread(m3u8, mp4):
58     ts_in = mp4 + os.sep + "ts"
59     if not os.path.exists(ts_in):
60         os.makedirs(ts_in)
61 
62     urlheader = os.path.dirname(m3u8)
63     m3u8_local = mp4 + "/index.m3u8"
64 
65     download(m3u8, m3u8_local)
66 
67     with open(m3u8_local, "r") as f:
68         ts_list = f.readlines()
69 
70     # single thread
71     line = 0
72     for i in ts_list:
73         line = line + 1
74         if "#" not in i:
75             print("line[%d]=%s" % (line, i))
76             i = i.replace("\n", "")
77             n = os.path.basename(i)
78             download(urlheader + "/" + i, ts_in + "/" + str(n) + ".ts")
79 
80 
81 def ts_join(mp4):
82     if os.path.exists(mp4 + os.sep + mp4):
83         os.remove(mp4 + os.sep + mp4)
84 
85     os.chdir(mp4 + "/ts")
86 
87     # copy /b   *.ts   new.mp4
88     with os.popen("copy /b   *.ts   " + mp4, "r") as p:
89         l = p.read()
90         print(l)
91     shutil.move(mp4, "..")
92 
93 
94 if __name__ == '__main__':
95     m3u8 = "https://p2.bdstatic.com/rtmp.liveshow.lss-user.baidubce.com/live/stream_bduid_3041161111_7873701187/merged_1669738224764_293317_1160_30460.m3u8"
96     mp4 = "Shenzhou 15.mp4"
97     m3u8_download_multi_thread(m3u8, mp4)
98     # m3u8_download_single_thread(m3u8, mp4)
99     ts_join(mp4)

 

Posted by DarkSuperHero on Wed, 30 Nov 2022 04:27:31 +0300