Solr introduction and installation
Introduction
Solr
Solr is a high-performance, Lucene-based full-text search server. At the same time, it has been expanded to provide a richer query language than Lucene, and at the same time, it is configurable, scalable, and optimized for query performance, and provides a complete function management interface. It is a very good full-text search engine.
lucene
Lucene is a sub-project of the apache jakarta project, an open source full-text search engine development kit, but it is not a full-text search engine, but a full-text search engine architecture, providing a complete query engine and index engine, part of the text analysis engine. The purpose of Lucene is to provide a simple and easy-to-use toolkit for software developers to easily implement full-text search functions in the target system, or to build a complete full-text search engine based on this.
Inverted index
In general, we first find the document, and then find the words contained in the document;
Inverted indexing reverses this process, using words, to find the documents in which it occurs.
Practical example
document number | document content |
---|---|
1 | Full Text Search Engine Toolkit |
2 | Architecture of a full-text search engine |
3 | query engine and indexing engine |
word segmentation result
document number | word segmentation result set |
---|---|
1 | {fulltext,search,engine,tool,package} |
2 | {fulltext,search,engine,of,architecture} |
3 | {query,engine,and,index,engine} |
Inverted index
Numbering | word | Document Number List |
---|---|---|
1 | full text | 1,2 |
2 | retrieve | 1,2 |
3 | engine | 1,2,3 |
4 | tool | 1 |
5 | Bag | 1 |
6 | Architecture | 2 |
7 | Inquire | 3 |
8 | index | 3 |
Install Solr on Centos
1. Installation prerequisites
Solr depends on JDK, and the index must first configure the JDK environment [linux configure JDK]
2. Download
download link: https://lucene.apache.org/solr/downloads.html
Then upload it to your server, use the tar -xvf command to extract it to the /usr/local/ directory
3. Start Solr
The default port after startup is 8938
Enter the unzipped Solr directory, and then enter the bin directory
# Start Solr, the default port is 8938 /solr start -force # Open port 8983 firewall-cmd --zone=public --add-port=8983/tcp --permanent firewall-cmd --reload
4. Browser access to solr console
http://192.168.64.170:8983 , ip to your own server ip
The following page appears to indicate success
Introduction to lucene API
Let's use a test case to learn about lucene's API
1. Create a maven project and import dependencies
Dependencies and builds are as follows
<dependencies> <!--lucene core API--> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>8.1.1</version> </dependency> <!-- unit test --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <!--lucene Chinese tokenizer--> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-smartcn</artifactId> <version>8.1.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
2. Write the code
package test; import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.FSDirectory; import org.junit.Test; import java.io.File; public class Test1 { String[] a = { "3, Huawei - huawei computer, Explosive models", "4, Huawei cell phone, flagship", "5, Lenovo - Thinkpad, business book", "6, Lenovo Mobile, Selfie Artifact" }; @Test public void test1() throws Exception { // folder FSDirectory d = FSDirectory.open(new File("D:/temp/abc/").toPath()); // Configuration tool, configure Chinese tokenizer IndexWriterConfig conf = new IndexWriterConfig(new SmartChineseAnalyzer()); // index output tool IndexWriter writer = new IndexWriter(d, conf); // Loop through four documents and output the index for (String s : a) { // id, name, buy point // 0 1 2 String[] arr = s.split("\\s*,\\s*"); // The three items of data of the product are encapsulated into a Document object Document doc = new Document(); // Document class under the lucene package doc.add(new LongPoint("id", Long.parseLong(arr[0]))); doc.add(new StoredField("id", Long.parseLong(arr[0]))); // storage summary doc.add(new TextField("title", arr[1], Field.Store.YES)); // Store the digest directly using the Store.YES parameter doc.add(new TextField("sellPoint", arr[2], Field.Store.YES)); writer.addDocument(doc); // add to output tool } writer.flush(); // Brush out, save to the specified folder writer.close(); } }
3. Run
There is no output after running, and then go to the folder declared in the code, you can see the following files

Indicates that the execution was successful
Use the lucene tool to view the index
1. Download the lucene tool
download link: https://www.apache.org/dyn/closer.lua/lucene/java/8.6.2/lucene-8.6.2.tgz
Then extract it to your system, (windows or desktop linux)
Find the luke folder inside, then run luke.bat (Windows) or luke.sh (Linux)
Then after a while, the following interface will appear: Select your own index directory, mine is D:/temp/abc

2. View documentation

3. Specify the tokenizer and test the tokenizer

4. Query test

5. id query
