For WeChat chat records, it can be achieved in two steps by collecting chat records and generating word clouds from chat data.
1. Chat record data collection
The most difficult part of the whole task is chat data collection. Due to the different difficulty of extracting chat records of QQ and WeChat, for QQ chat data, simply export it. For WeChat chat data, WeChat saves the chat data in the EnMicroMsg.db database and encrypts it with MD5, so we can successfully obtain the chat record with someone by obtaining the files in the db database and decrypting the data. The following begins to formally introduce how to collect chat data.
- Bluestacks Simulator (by importing WeChat data from the mobile phone to the computer, and then importing the data from the computer into the Bluestacks simulator)
1. The data is imported into the simulator from the mobile phone
The bluestack simulator icon is as follows, the core tool in the whole process
Back up the chat history of the mobile phone in the lower left corner of the computer WeChat to the computer
Log in to the Bluestacks simulator and download the WeChat app to log in. This operation will remove the mobile phone WeChat, but you must keep the computer WeChat login (the simulator operation is slightly delayed and wait patiently)
Computer WeChat lower left corner to restore computer chat to mobile phone (emulator)
2. Get the database file
Obtain the emulator Root permission according to the operation in the figure
The simulator uses the app described in the figure below to process the documents in the simulator
Follow the path to find the file location of the WeChat chat record. There are two folders with a combination of numbers and letters under the blue box path. The data is stored in one of the folders. Find EnMicroMsg.db, the database is the chat record
Before the operation, delete the non-operation records in the chat record, and then import the data from the simulator to the local folder. The BlueStacks simulator can access the local folder of the computer (long press the database file, and then click the copy operation to jump to Save it in my shape folder, that is, save it to the local folder)
Can chat history has been successfully saved in the local folder
3. Database file decryption
Since the db database is encrypted by MD5, the data can be successfully obtained by decrypting it
Use IMEI and auth_uin to obtain data through MD5 decryption
Since the IMEI was unsuccessfully obtained, use 1234567890ABCDEF instead
auth_uin can be obtained through the data file in the simulator, just find the value corresponding to name = "auth_uin" in the xml file.
Decrypt through MD5, use IMEI + auth_uin, use 32 bits (small) to decrypt, the first 7 bits are the database password
Use the sqlcipher tool to load SQLite data
Download path: SQLCipher - Zetetic
The data can be successfully obtained, File - Export - CSV file format is enough. At this point, we finally successfully obtained the chat data. After some operations, we can finally get the data and play it to the fullest. Next, write a python script to generate a word cloud image. you're done
2. Generate word cloud from data
After obtaining the stop word list, the word cloud image can be run through the code
#-*- coding : utf-8 -*- # coding:unicode_escape import re import jieba import wordcloud import pandas as pd data = pd.read_csv('my_data.csv', encoding='gbk') data = data[['status','content']] #data i send data_me = data[data['status'] == 2] #data i received data_other = data[data['status'] == 4] print(len(data_me)) print(len(data_other)) stop_words =  with open('stop_words.txt', 'r', encoding='utf-8') as f: lines = f.readlines() for mes in lines: stop_words.append(mes[:-1]) str_me = '' str_other = '' for index in data_me.index: mes = data.loc[index]['content'] if mes != '' and 'wxid' not in mes and '<' not in mes and '[' not in mes: str_me += mes for index in data_other.index: mes = data.loc[index]['content'] if mes != '' and 'wxid' not in mes and '<' not in mes and '[' not in mes: str_other += mes #jieba participle jieba_me = jieba.lcut(str_me) ls_me =  for item in jieba_me: if item not in stop_words and len(item) >= 2: ls_me.append(item) jieba_other = jieba.lcut(str_other) ls_other =  for item in jieba_other: if item not in stop_words and len(item) >= 2: ls_other.append(item) txt_other = " ".join(ls_other) txt_me = " ".join(ls_other) txt_all = txt_other + ' ' + txt_me #Save word cloud photo w = wordcloud.WordCloud( font_path = "msyh.ttc", width = 1000, height = 700, background_color = 'white', max_words = 300) w.generate(txt_other) w.to_file('other.png') w.generate(txt_me) w.to_file('me.png') w.generate(txt_all) w.to_file('all.png')
Finally, you're done!