Introduction
Because the company needs to identify the relevant information in the uploaded pictures, tees4j was chosen to implement relevant functions. It is not very good to directly identify through tess4j, so after processing the pictures through java code, use tess4j to identify them, and then deploy them to the linux server tesseract needs to be installed, so writing this post for the record. The landlord is using the tesseract-4.1.1 version, and the installation steps are introduced below.
1. Installation dependencies
yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel gcc gcc-c++
2. Download the compressed package
It is recommended to go to the official website to download and upload to the server, you can download through the portal below
leptonica-1.79.0.tar.gz
tesseract-4.1.1.tar.gz
After downloading, put it in the server folder you defined
3. Installation
Unified installation under /usr/local/
3.1, install leptonica first
Excuting an order
mkdir /usr/local/leptonica tar -xzvf leptonica-1.79.0.tar.gz cd leptonica-1.79.0 ./configure --prefix=/usr/local/leptonica && make && make install
3.2, configure leptonica environment variables
Excuting an order
vim /etc/profile
Append configuration at the end of the file
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/leptonica/lib/pkgconfig export PKG_CONFIG_PATH CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/leptonica/include/leptonica export CPLUS_INCLUDE_PATH C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/leptonica/include/leptonica export C_INCLUDE_PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib export LD_LIBRARY_PATH LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib export LIBRARY_PATH LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica export LIBLEPT_HEADERSDIR
application configuration
source /etc/profile
3.3. Install tesserac
Excuting an order
mkdir /usr/local/tesseract tar -xzvf 4.1.1.tar.gz cd tesseract-4.1.1 # The autogen.sh file must be run first to have a configuer file ./autogen.sh # Compile and install to the specified folder ./configure --prefix=/usr/local/tesseract && make && make install
3.4. Download the recognition library
All recognition library addresses: https://github.com/tesseract-ocr/tessdata
The landlord only needs the Chinese recognition library, so the following is an example of the Chinese recognition library.
Click to download the Simplified Chinese language library chi_sim.traineddata , upload to /usr/local/tesseract/share/tessdata/directory.
If the recognition library has been used in the java project, there is no need to download it again, just configure the directory address used in java in the next step of configuration.
3.5. Configure tesserac environment variables
Excuting an order
vim /etc/profile
Note: The address behind TESSDATA_PREFIX is the file directory where the training library is located. The host directly referenced the recognition library in the java project. You can download and configure it yourself.
It is recommended to place the identification library at /usr/local/tesseract/share/tessdata, pay attention to synchronously modify the value of TESSDATA_PREFIX below
PATH=$PATH:/usr/local/tesseract/bin export PATH export TESSDATA_PREFIX=/home/api/upload/tessdata ##Note: This location is the file directory where the training library is located export PATH=$PATH:$TESSDATA_PREFIX
application configuration
source /etc/profile
3.6. Test whether the installation is successful
Excuting an order
tesseract --version
The following is a successful installation
3.7, test recognition function
Upload the picture shown below to the server
Excuting an order
Note: chi_sim is the prefix name of the specified recognition library, which must be specified and can be replaced with the character set name you need
tesseract /home/test.png /home/result -l chi_sim cat /home/result.txt
The result is as follows
At this point, tesseract is installed.