Solr introduction and installation

Solr introduction and installation

Introduction

Solr

Solr is a high-performance, Lucene-based full-text search server. At the same time, it has been expanded to provide a richer query language than Lucene, and at the same time, it is configurable, scalable, and optimized for query performance, and provides a complete function management interface. It is a very good full-text search engine.

lucene

Lucene is a sub-project of the apache jakarta project, an open source full-text search engine development kit, but it is not a full-text search engine, but a full-text search engine architecture, providing a complete query engine and index engine, part of the text analysis engine. The purpose of Lucene is to provide a simple and easy-to-use toolkit for software developers to easily implement full-text search functions in the target system, or to build a complete full-text search engine based on this.

Inverted index

In general, we first find the document, and then find the words contained in the document;

Inverted indexing reverses this process, using words, to find the documents in which it occurs.

Practical example

document number document content
1 Full Text Search Engine Toolkit
2 Architecture of a full-text search engine
3 query engine and indexing engine

word segmentation result

document number word segmentation result set
1 {fulltext,search,engine,tool,package}
2 {fulltext,search,engine,of,architecture}
3 {query,engine,and,index,engine}

Inverted index

Numbering word Document Number List
1 full text 1,2
2 retrieve 1,2
3 engine 1,2,3
4 tool 1
5 Bag 1
6 Architecture 2
7 Inquire 3
8 index 3

Install Solr on Centos

1. Installation prerequisites

Solr depends on JDK, and the index must first configure the JDK environment [linux configure JDK]

2. Download

download link: https://lucene.apache.org/solr/downloads.html

Then upload it to your server, use the tar -xvf command to extract it to the /usr/local/ directory

3. Start Solr

The default port after startup is 8938

Enter the unzipped Solr directory, and then enter the bin directory

# Start Solr, the default port is 8938
/solr start -force

# Open port 8983
firewall-cmd --zone=public --add-port=8983/tcp --permanent
firewall-cmd --reload

4. Browser access to solr console

http://192.168.64.170:8983 , ip to your own server ip

The following page appears to indicate success

Introduction to lucene API

Let's use a test case to learn about lucene's API

1. Create a maven project and import dependencies

Dependencies and builds are as follows

<dependencies>
    <!--lucene core API-->
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>8.1.1</version>
    </dependency>

    <!-- unit test -->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>

    <!--lucene Chinese tokenizer-->
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-smartcn</artifactId>
        <version>8.1.1</version>
    </dependency>

</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.8.0</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>

2. Write the code

package test;

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;
import org.junit.Test;

import java.io.File;

public class Test1 {

    String[] a = {
            "3, Huawei - huawei computer, Explosive models",
            "4, Huawei cell phone, flagship",
            "5, Lenovo - Thinkpad, business book",
            "6, Lenovo Mobile, Selfie Artifact"
    };

    @Test
    public void test1() throws Exception {
        // folder
        FSDirectory d = FSDirectory.open(new File("D:/temp/abc/").toPath());

        // Configuration tool, configure Chinese tokenizer
        IndexWriterConfig conf = new IndexWriterConfig(new SmartChineseAnalyzer());

        // index output tool
        IndexWriter writer = new IndexWriter(d, conf);

        // Loop through four documents and output the index
        for (String s : a) {
            // id, name, buy point
            // 0   1   2
            String[] arr = s.split("\\s*,\\s*");
            // The three items of data of the product are encapsulated into a Document object
            Document doc = new Document(); // Document class under the lucene package
            doc.add(new LongPoint("id", Long.parseLong(arr[0])));
            doc.add(new StoredField("id", Long.parseLong(arr[0]))); // storage summary
            doc.add(new TextField("title", arr[1], Field.Store.YES)); // Store the digest directly using the Store.YES parameter
            doc.add(new TextField("sellPoint", arr[2], Field.Store.YES));

            writer.addDocument(doc); // add to output tool
        }

        writer.flush(); // Brush out, save to the specified folder
        writer.close();
    }

}

3. Run

There is no output after running, and then go to the folder declared in the code, you can see the following files

Indicates that the execution was successful

Use the lucene tool to view the index

1. Download the lucene tool

download link: https://www.apache.org/dyn/closer.lua/lucene/java/8.6.2/lucene-8.6.2.tgz

Then extract it to your system, (windows or desktop linux)

Find the luke folder inside, then run luke.bat (Windows) or luke.sh (Linux)

Then after a while, the following interface will appear: Select your own index directory, mine is D:/temp/abc

2. View documentation

3. Specify the tokenizer and test the tokenizer

4. Query test

5. id query

Posted by creet0n on Tue, 17 May 2022 21:33:12 +0300