Excerpt from: https://www.cnblogs.com/python147/p/14511627.html
1 Introduction
The text and pictures in this article come from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us in time for resolution.
PS: If you need Python learning materials, you can click the link below to get them yourself
Python free learning materials, codes and communication answers Click to join
When crawling videos, I found that the current videos are all encrypted (m3u8), no longer mp4 or avi links are directly displayed on the webpage, and they are all encrypted to form ts files for segmented playback.
Today I will teach you how to download m3u8 encrypted video through python crawling.
2. Analyzing web pages
1. Movie video source
http://www.caisetv.com/

2. Analyze the m3u8 encrypted directory
http://www.caisetv.com/dongzuopian/chaidanzhuanjia/0-1.html

On the page where the video is played, you can check the network data packets by pressing F12
https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/index.m3u8

The ts here is the encrypted segmented video of the movie
https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/
After the above m3u8 is linked to index.m3u8, the ts names such as 075a34cccdd000000.ts are added to the segmented video link
As follows:
https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/075a34cccdd000000.ts
Download this segmented video through a browser and open it:

So just download all the ts and merge it is the complete movie video! ! !
3. Download ts
1. Download ts segmented video
Just downloaded all the names of ts

Next, read this file through python code, extract the name, splicing the link, download it and save it in a folder!
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',} ###download ts file def download(url,name): r = requests.get(url, headers=headers) with open(name+"", "wb") as code: code.write(r.content) with open("index.m3u8","r") as f: ts_list = f.readlines() #Remove the previous useless information ts_list = ts_list[5:] urlheader="https://xigua-cdn.haima-zuida.com/20210219/19948_fcbc225a/1000k/hls/" count = 0 for i in ts_list: if "#" not in i: i = i.replace("\n","") download(urlheader+""+i,"cdzj2/"+str(count)+".ts") count = count+1 print(count)

In this way, all the ts files can be downloaded, but downloading one by one is very slow. Next, download through multiple threads to increase the download speed! ! !
2. Multi-thread download ts video
for i in ts_list: if "#" not in i: i = i.replace("\n","") n = i[-7:] threading.Thread(target=download, args=(urlheader+""+i,"cdzj2/"+str(n),)).start() #download(urlheader+""+i,"cdzj2/"+str(count)+".ts")

These ts files can be downloaded locally through multi-threading soon! ! !
4. Merge ts
cmd merge files
copy /b *.ts new.mp4
Through this command (run in the cmd terminal), you can merge the ts files in the folder containing the ts files (arrange and merge them in order of names) and save them as new.mp4

5. Summary
1. Analyze m3u8 encrypted files
2.python download ts file
3.cmd merges ts and saves it in mp4 format
6. My code (full)
1 # -*- coding:utf-8 -*- 2 import os 3 import requests 4 import shutil 5 import time 6 import threading 7 8 9 def download(url, name): 10 """download ts document""" 11 12 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0', } 13 r = requests.get(url, headers=headers) 14 with open(name + "", "wb") as code: 15 code.write(r.content) 16 17 18 def m3u8_download_multi_thread(m3u8, mp4, thread_max=10): 19 ts_in = mp4 + os.sep + "ts" 20 if not os.path.exists(ts_in): 21 os.makedirs(ts_in) # mkdir -p ./a/b/c 22 23 urlheader = os.path.dirname(m3u8) 24 m3u8_local = mp4 + "/index.m3u8" 25 26 download(m3u8, m3u8_local) 27 28 with open(m3u8_local, "r") as f: 29 ts_list = f.readlines() 30 31 # Multithreading 32 line = 0 33 count = 0 34 for i in ts_list: 35 if count >= 10: 36 break 37 line = line + 1 38 while thread_max <= len(threading.enumerate()): 39 time.sleep(1) 40 if "#" not in i: 41 count = count + 1 42 i = i.replace("\n", "") 43 print("line[%d]=%s" % (line, i)) 44 n = os.path.basename(i) 45 try: 46 threading.Thread(target=download, args=(urlheader + "/" + i, ts_in + "/" + str(n),)).start() 47 # download(urlheader+""+i,"cdzj2/"+str(count)+".ts") 48 except Exception as e: 49 print("threading.cnt=%d" % len(threading.enumerate())) 50 51 # Wait for all child threads to end 52 while 1 < len(threading.enumerate()): 53 time.sleep(1) 54 # print("thread_count=%d" % len(threading.enumerate())) 55 56 57 def m3u8_download_single_thread(m3u8, mp4): 58 ts_in = mp4 + os.sep + "ts" 59 if not os.path.exists(ts_in): 60 os.makedirs(ts_in) 61 62 urlheader = os.path.dirname(m3u8) 63 m3u8_local = mp4 + "/index.m3u8" 64 65 download(m3u8, m3u8_local) 66 67 with open(m3u8_local, "r") as f: 68 ts_list = f.readlines() 69 70 # single thread 71 line = 0 72 for i in ts_list: 73 line = line + 1 74 if "#" not in i: 75 print("line[%d]=%s" % (line, i)) 76 i = i.replace("\n", "") 77 n = os.path.basename(i) 78 download(urlheader + "/" + i, ts_in + "/" + str(n) + ".ts") 79 80 81 def ts_join(mp4): 82 if os.path.exists(mp4 + os.sep + mp4): 83 os.remove(mp4 + os.sep + mp4) 84 85 os.chdir(mp4 + "/ts") 86 87 # copy /b *.ts new.mp4 88 with os.popen("copy /b *.ts " + mp4, "r") as p: 89 l = p.read() 90 print(l) 91 shutil.move(mp4, "..") 92 93 94 if __name__ == '__main__': 95 m3u8 = "https://p2.bdstatic.com/rtmp.liveshow.lss-user.baidubce.com/live/stream_bduid_3041161111_7873701187/merged_1669738224764_293317_1160_30460.m3u8" 96 mp4 = "Shenzhou 15.mp4" 97 m3u8_download_multi_thread(m3u8, mp4) 98 # m3u8_download_single_thread(m3u8, mp4) 99 ts_join(mp4)