Automatic recognition of nucleic acid detection time by software using OpenCV

Question introduction

In the program engineering training course, a problem occurred:

Mobile phone screenshot nucleic acid detection report, using OpenCV software to automatically identify whether the time for nucleic acid detection is within 72 hours.

I. Preparations

Software needed to accomplish this task:

- Python3.x
- OpenCV-Python 4.x
- Tesseract-OCR 5.x
- Win10 64 or Win11 64

1. Installation of Python

Choose the default installation when installing the Python SDK and check to add it to the environment variable.
Easy to download directly from Microsoft Store
Or install PyCharm
PyCharm download address

2. Install OpenCV-Python development kit

Call cmd, or install using pip instructions in the terminal:

pip install opencv-python

If the download speed is slow, you can use the Tsinghua mirror source address and add -i at the end of the command https://pypi.tuna.tsinghua.edu.cn/simple/
Full Instructions:

pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple/

3. Install Tesseract-OCR

Official website
Official Documents
Language Pack Address
Download Address
First install Tesseract-OCR Python SDK support, call cmd, or install in a terminal using pip instructions:

pip install pytesseract -i https://pypi.tuna.tsinghua.edu.cn/simple/

Then click the download address link to install the latest version of Tesseract-OCR, and add in the environment variable:

C:\Program Files\Tesseract-OCR

Once OpenCV-Python and Tesseract-OCR are installed and configured, you need to verify the correctness further through code. Open the Pycharm IDE, create a new Python project and python file, and enter the following code:

import pytesseract as tess
print(tess.get_tesseract_version())
print(tess.get_languages())

Or enter in the terminal:

tesseract -v

If the installation is successful, the corresponding version number and the applicable language are displayed.
Last click on the language pack address link to install chi_sim Simplified Chinese Language Pack, which can be downloaded and placed directly in the tessdata folder of the program installation directory.
The command line uses the tesseract --list-langs command to view the languages currently supported by the software:

tesseract --list-langs

2. Writing Procedures

1. Introducing Libraries

This program requires library functions, code as follows (example):

import cv2 as cv
import pytesseract as tess
from datetime import datetime

2. Use Tesseract for Character Recognition (OCR)

OCR (optical character recognition) refers to the technology that directly recognizes an image containing text as computer text (computer black and white dot matrix). Text in images is typically printed.
Tesseract is an OCR open source library on github. Today we will use Tesseract for text recognition.
For example, we need to recognize the text on a nucleic acid test report:

Enter the following code to test:

import cv2 as cv
import pytesseract as tess
image = cv.imread("hesuan.png")
image_rgb = cv.cvtColor(image, cv.COLOR_BGR2RGB)
text = tess.image_to_string(image_rgb, lang="eng")
content = text.replace("\f", "").split("\n")
txt = []
for c in content:
    if len(c) > 0:
        print(c)
h, w, c = image.shape
boxes = tess.image_to_boxes(image)
for b in boxes.splitlines():
    b = b.split(' ')
    image = cv.rectangle(image, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
cv.imshow('text detect', image)
cv.waitKey(0)
cv.destroyAllWindows()

Run result:

You can see that the information in the picture is well recognized.
However, the output is garbled:

Because we chose the English Standard Library, we need to change the lang parameter in the code to Simplified Chinese if we need to recognize Chinese:

text = tess.image_to_string(image_rgb, lang="chi_sim")


Chinese and corresponding date were successfully identified.

3. Screening date

List and string traversal allow you to quickly filter out date information.

for c in content:
    if len(c) > 0:
        txt.append(c)
for i in txt:
    if i[0]=='check' and i[2]=='measure' and i[4]=='time' and i[6]=='between':
        ret=i
        break
    else:
        ret=False
print(ret)
time = ret[10:]
print(time)

Corresponding results can be obtained

4. Date operations

Dates in Python are not of their own data type, but we can import a module called datetime to treat dates as date objects.
Datetime is a module, and datetime module also contains a datetime class, which is imported from datetime import datetime.

from datetime import datetime

The date and time entered by the user is a string. To process the date and time, you must first convert str to datetime. The conversion method is through datetime. The strptime () implementation requires a formatted string of dates and times:

date = datetime.strptime(time, '%Y-%m-%d %H:%M:%S')

Use datetime.now() can get the system date:

print(datetime.now())

The formatted date can be subtracted by delta. The days() function displays the corresponding number of days.

delta = datetime.now() - date
print(delta.days)
if(delta.days<=3):
    print("Nucleic acid reports within three days")
else:
    print("Nucleic acid report not in three days")

5. Run the program

Run the program and get the corresponding results.

summary

This paper introduces an OCR scheme based on OpenCV to realize software automatic recognition of nucleic acid detection time within 72 hours. The prerequisite for the program to run accurately is that the screenshots are clear and visible. If you use the captured image to write recognition, you need to use OpenCV for more image processing, such as binarization, projection transformation, and so on.

Tags: Python OpenCV programming language

Posted by fitzromeo on Sat, 02 Jul 2022 22:04:45 +0300