Generate word cloud from WeChat chat records

For WeChat chat records, it can be achieved in two steps by collecting chat records and generating word clouds from chat data.

1. Chat record data collection

The most difficult part of the whole task is chat data collection. Due to the different difficulty of extracting chat records of QQ and WeChat, for QQ chat data, simply export it. For WeChat chat data, WeChat saves the chat data in the EnMicroMsg.db database and encrypts it with MD5, so we can successfully obtain the chat record with someone by obtaining the files in the db database and decrypting the data. The following begins to formally introduce how to collect chat data.

tool:

  • Bluestacks Simulator (by importing WeChat data from the mobile phone to the computer, and then importing the data from the computer into the Bluestacks simulator)
  • sqlcipher.exe

1. The data is imported into the simulator from the mobile phone

Bluestacks Android Simulation Organ Network

The bluestack simulator icon is as follows, the core tool in the whole process

Back up the chat history of the mobile phone in the lower left corner of the computer WeChat to the computer

Log in to the Bluestacks simulator and download the WeChat app to log in. This operation will remove the mobile phone WeChat, but you must keep the computer WeChat login (the simulator operation is slightly delayed and wait patiently)

Computer WeChat lower left corner to restore computer chat to mobile phone (emulator)

2. Get the database file

Obtain the emulator Root permission according to the operation in the figure

The simulator uses the app described in the figure below to process the documents in the simulator

Follow the path to find the file location of the WeChat chat record. There are two folders with a combination of numbers and letters under the blue box path. The data is stored in one of the folders. Find EnMicroMsg.db, the database is the chat record

 

 Before the operation, delete the non-operation records in the chat record, and then import the data from the simulator to the local folder. The BlueStacks simulator can access the local folder of the computer (long press the database file, and then click the copy operation to jump to Save it in my shape folder, that is, save it to the local folder)

Can chat history has been successfully saved in the local folder

3. Database file decryption

Since the db database is encrypted by MD5, the data can be successfully obtained by decrypting it

Use IMEI and auth_uin to obtain data through MD5 decryption

Since the IMEI was unsuccessfully obtained, use 1234567890ABCDEF instead

auth_uin can be obtained through the data file in the simulator, just find the value corresponding to name = "auth_uin" in the xml file.

Decrypt through MD5, use IMEI + auth_uin, use 32 bits (small) to decrypt, the first 7 bits are the database password

 MD5 Online Encryption/Decryption/Crack - MD5 Online (sojson.com)

Use the sqlcipher tool to load SQLite data

Download path: SQLCipher - Zetetic

The data can be successfully obtained, File - Export - CSV file format is enough. At this point, we finally successfully obtained the chat data. After some operations, we can finally get the data and play it to the fullest. Next, write a python script to generate a word cloud image. you're done

 

2. Generate word cloud from data

After obtaining the stop word list, the word cloud image can be run through the code

#-*- coding : utf-8 -*-
# coding:unicode_escape

import re
import jieba
import wordcloud
import pandas as pd

data = pd.read_csv('my_data.csv', encoding='gbk')
data = data[['status','content']]

#data i send
data_me = data[data['status'] == 2]
#data i received
data_other = data[data['status'] == 4]

print(len(data_me))
print(len(data_other))

stop_words = []
with open('stop_words.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()
    for mes in lines:
        stop_words.append(mes[:-1])


str_me = ''
str_other = ''

for index in data_me.index:
    mes = data.loc[index]['content']
    if mes != '' and 'wxid' not in mes and '<' not in mes and '[' not in mes:
        str_me += mes

for index in data_other.index:
    mes = data.loc[index]['content']
    if mes != '' and 'wxid' not in mes and '<' not in mes and '[' not in mes:
        str_other += mes

#jieba participle
jieba_me = jieba.lcut(str_me)
ls_me = []
for item in jieba_me:
    if item not in stop_words and len(item) >= 2:
        ls_me.append(item)

jieba_other = jieba.lcut(str_other)
ls_other = []
for item in jieba_other:
    if item not in stop_words and len(item) >= 2:
        ls_other.append(item)

txt_other = " ".join(ls_other)
txt_me = " ".join(ls_other)
txt_all = txt_other + ' ' + txt_me

#Save word cloud photo
w = wordcloud.WordCloud( font_path = "msyh.ttc", width = 1000, height = 700, background_color = 'white', max_words = 300)
w.generate(txt_other)
w.to_file('other.png')
w.generate(txt_me)
w.to_file('me.png')
w.generate(txt_all)
w.to_file('all.png')

 

Finally, you're done!

refer to:

Use python to analyze WeChat chat records

WeChat chat history statistics 

Tags: Mini Program

Posted by classic on Fri, 20 May 2022 02:13:12 +0300