Question introduction
In the program engineering training course, a problem occurred:
Mobile phone screenshot nucleic acid detection report, using OpenCV software to automatically identify whether the time for nucleic acid detection is within 72 hours.
I. Preparations
Software needed to accomplish this task:
- Python3.x - OpenCV-Python 4.x - Tesseract-OCR 5.x - Win10 64 or Win11 64
1. Installation of Python
Choose the default installation when installing the Python SDK and check to add it to the environment variable.
Easy to download directly from Microsoft Store
Or install PyCharm
PyCharm download address
2. Install OpenCV-Python development kit
Call cmd, or install using pip instructions in the terminal:
pip install opencv-python
If the download speed is slow, you can use the Tsinghua mirror source address and add -i at the end of the command https://pypi.tuna.tsinghua.edu.cn/simple/
Full Instructions:
pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/
3. Install Tesseract-OCR
Official website
Official Documents
Language Pack Address
Download Address
First install Tesseract-OCR Python SDK support, call cmd, or install in a terminal using pip instructions:
pip install pytesseract -i https://pypi.tuna.tsinghua.edu.cn/simple/
Then click the download address link to install the latest version of Tesseract-OCR, and add in the environment variable:
C:\Program Files\Tesseract-OCR
Once OpenCV-Python and Tesseract-OCR are installed and configured, you need to verify the correctness further through code. Open the Pycharm IDE, create a new Python project and python file, and enter the following code:
import pytesseract as tess print(tess.get_tesseract_version()) print(tess.get_languages())
Or enter in the terminal:
tesseract -v
If the installation is successful, the corresponding version number and the applicable language are displayed.
Last click on the language pack address link to install chi_sim Simplified Chinese Language Pack, which can be downloaded and placed directly in the tessdata folder of the program installation directory.
The command line uses the tesseract --list-langs command to view the languages currently supported by the software:
tesseract --list-langs
2. Writing Procedures
1. Introducing Libraries
This program requires library functions, code as follows (example):
import cv2 as cv import pytesseract as tess from datetime import datetime
2. Use Tesseract for Character Recognition (OCR)
OCR (optical character recognition) refers to the technology that directly recognizes an image containing text as computer text (computer black and white dot matrix). Text in images is typically printed.
Tesseract is an OCR open source library on github. Today we will use Tesseract for text recognition.
For example, we need to recognize the text on a nucleic acid test report:
Enter the following code to test:
import cv2 as cv import pytesseract as tess image = cv.imread("hesuan.png") image_rgb = cv.cvtColor(image, cv.COLOR_BGR2RGB) text = tess.image_to_string(image_rgb, lang="eng") content = text.replace("\f", "").split("\n") txt = [] for c in content: if len(c) > 0: print(c) h, w, c = image.shape boxes = tess.image_to_boxes(image) for b in boxes.splitlines(): b = b.split(' ') image = cv.rectangle(image, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2) cv.imshow('text detect', image) cv.waitKey(0) cv.destroyAllWindows()
Run result:
You can see that the information in the picture is well recognized.
However, the output is garbled:
Because we chose the English Standard Library, we need to change the lang parameter in the code to Simplified Chinese if we need to recognize Chinese:
text = tess.image_to_string(image_rgb, lang="chi_sim")
Chinese and corresponding date were successfully identified.
3. Screening date
List and string traversal allow you to quickly filter out date information.
for c in content: if len(c) > 0: txt.append(c) for i in txt: if i[0]=='check' and i[2]=='measure' and i[4]=='time' and i[6]=='between': ret=i break else: ret=False print(ret) time = ret[10:] print(time)
Corresponding results can be obtained
4. Date operations
Dates in Python are not of their own data type, but we can import a module called datetime to treat dates as date objects.
Datetime is a module, and datetime module also contains a datetime class, which is imported from datetime import datetime.
from datetime import datetime
The date and time entered by the user is a string. To process the date and time, you must first convert str to datetime. The conversion method is through datetime. The strptime () implementation requires a formatted string of dates and times:
date = datetime.strptime(time, '%Y-%m-%d %H:%M:%S')
Use datetime.now() can get the system date:
print(datetime.now())
The formatted date can be subtracted by delta. The days() function displays the corresponding number of days.
delta = datetime.now() - date print(delta.days) if(delta.days<=3): print("Nucleic acid reports within three days") else: print("Nucleic acid report not in three days")
5. Run the program
Run the program and get the corresponding results.
summary
This paper introduces an OCR scheme based on OpenCV to realize software automatic recognition of nucleic acid detection time within 72 hours. The prerequisite for the program to run accurately is that the screenshots are clear and visible. If you use the captured image to write recognition, you need to use OpenCV for more image processing, such as binarization, projection transformation, and so on.