linux installs tesseract to support tess4j image recognition

Introduction

Because the company needs to identify the relevant information in the uploaded pictures, tees4j was chosen to implement relevant functions. It is not very good to directly identify through tess4j, so after processing the pictures through java code, use tess4j to identify them, and then deploy them to the linux server tesseract needs to be installed, so writing this post for the record. The landlord is using the tesseract-4.1.1 version, and the installation steps are introduced below.

1. Installation dependencies

yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel gcc gcc-c++

2. Download the compressed package

It is recommended to go to the official website to download and upload to the server, you can download through the portal below
leptonica-1.79.0.tar.gz
tesseract-4.1.1.tar.gz
After downloading, put it in the server folder you defined

3. Installation

Unified installation under /usr/local/

3.1, install leptonica first

Excuting an order

mkdir  /usr/local/leptonica
tar -xzvf leptonica-1.79.0.tar.gz
cd leptonica-1.79.0
./configure --prefix=/usr/local/leptonica  && make  && make install

3.2, configure leptonica environment variables

Excuting an order

vim /etc/profile

Append configuration at the end of the file

PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/leptonica/lib/pkgconfig
export PKG_CONFIG_PATH
CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export CPLUS_INCLUDE_PATH
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export C_INCLUDE_PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib
export LD_LIBRARY_PATH
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib
export LIBRARY_PATH
LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica
export LIBLEPT_HEADERSDIR

application configuration

source /etc/profile

3.3. Install tesserac

Excuting an order

mkdir /usr/local/tesseract
tar -xzvf 4.1.1.tar.gz
cd tesseract-4.1.1
# The autogen.sh file must be run first to have a configuer file
./autogen.sh
# Compile and install to the specified folder
./configure --prefix=/usr/local/tesseract  && make && make install

3.4. Download the recognition library

All recognition library addresses: https://github.com/tesseract-ocr/tessdata
The landlord only needs the Chinese recognition library, so the following is an example of the Chinese recognition library.
Click to download the Simplified Chinese language library chi_sim.traineddata , upload to /usr/local/tesseract/share/tessdata/directory.
If the recognition library has been used in the java project, there is no need to download it again, just configure the directory address used in java in the next step of configuration.

3.5. Configure tesserac environment variables

Excuting an order

vim /etc/profile

Note: The address behind TESSDATA_PREFIX is the file directory where the training library is located. The host directly referenced the recognition library in the java project. You can download and configure it yourself.
It is recommended to place the identification library at /usr/local/tesseract/share/tessdata, pay attention to synchronously modify the value of TESSDATA_PREFIX below

PATH=$PATH:/usr/local/tesseract/bin
export PATH
export TESSDATA_PREFIX=/home/api/upload/tessdata  ##Note: This location is the file directory where the training library is located
export PATH=$PATH:$TESSDATA_PREFIX

application configuration

source /etc/profile

3.6. Test whether the installation is successful

Excuting an order

tesseract --version

The following is a successful installation

3.7, test recognition function

Upload the picture shown below to the server

Excuting an order
Note: chi_sim is the prefix name of the specified recognition library, which must be specified and can be replaced with the character set name you need

tesseract   /home/test.png  /home/result  -l chi_sim
cat /home/result.txt

The result is as follows

At this point, tesseract is installed.

Tags: Java Linux Operation & Maintenance server

Posted by dakkonz on Wed, 18 Jan 2023 11:38:01 +0300