elasticsearch simple usage summary
1. Preparation for ELK installation
1.1 ELK download address
ElasticSearch: https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D
logstash: https://mirrors.huaweicloud.com/logstash/?C=N&O=D
The visual interface elasticsearch head https://github.com/mobz/elasticsearch-head
kibana: https://mirrors.huaweicloud.com/kibana/?C=N&O=D
ik participle https://github.com/medcl/elasticsearch-analysis-ik
jdk must be version 1.8 or above
1.2 macos installation jdk and mvn
Configuring jdk under mac
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home PATH=$JAVA_HOME/bin:$PATH:. CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:. export JAVA_HOME export PATH export CLASSPATH
Install jdk1 8 and configure environment variables
org.elasticsearch.ElasticsearchException: X-Pack is not supported and Machine Learning is not available for [windows-x86]; you can use the other X-Pack features (unsupported) by setting xpack.ml.enabled: false in elasticsearch.yml
Add in the yml file of es
cluster.initial_master_nodes: ["node-1"] # node-1 here is the value configured for node name xpack.ml.enabled: false # If the connection fails, cross domain problems (cross port, Cross website, etc.) are found. Modify the configuration and set cross domain http.cors.enabled: true http.cors.allow-origin: "*"
Open profile
vi ~/.bash_profile
maven path is downloaded with idea
export M2_HOME=/Users/lisen/apache-maven-3.6.3 export PATH=$PATH:$M2_HOME/bin
Refresh resources
source ~/.bash_profile
Test whether maven is installed successfully
mvn -v
There are many installations of windows, so I won't talk about it here...
1.3 elk installation reference
https://blog.csdn.net/mgdj25/article/details/105740191
Here's the basic thing to do
-
The internationalization setting of kibana is set to zh CN in yml
-
Add yml file under es
http.cors.enabled: true http.cors.allow-origin: "*"
-
After the word splitter is downloaded, modify the pom file in it, modify the version of the corresponding es, put it in the ik directory of ES plugins (create one), put it in it, and enter it into the ik directory to pass
mvn clean mvn compile mvn package
To download the jar package, compile and package the files with config and jar package under releases under the target file, copy them and put them under ik, and delete other files (key points)
-
Here's a special reminder about the granularity setting of word segmentation. For example, a word segmentation just splits the monomer of a word, "Li Kui ha ha", which may just separate Li Kui and not be divided into the word "Li Kui", so you have to write your own dictionary
Add a new "cyx.dic" in the config directory under ik participle
Add your own dic to the configuration ikanalyzer cfg. In XML
Add one (there are notes for many usages)
<!-- Users can expand their own user dictionary here--> <entry key="ext_dict">cyx.dic</entry>
2. ES core concept
What are clusters, nodes, indexes, types, documents, shards, and mappings?
Elasticsearch is an objective comparison between document oriented, relational database and elasticsearch! Everything is json
Relational DB | Elasticsearch |
---|---|
database | Indexes |
tables | types |
rows | documents |
Fields (columns) | fields |
Physical design:
Elastic search divides each index into multiple slices in the background. Each fragment can be migrated between different servers in the cluster
Logic design:
There are multiple documents in an index type. When we index a document, we can find it in this order: index - "type -" document ID. through this combination, we can index a specific document. Note: the ID does not have to be an integer, it is actually a string.
file
file
It's our records one by one
Previously, elasticsearch is document oriented, which means that the smallest unit of cable bow and search data is the document. In elasticsearch, the document has several important attributes:
- Self contained, - documents contain both fields and corresponding values, that is, key:value!
- It can be hierarchical, - a document contains itself. That's how complex logical entities come from! {it's a json object! fastjson automatically converts!}
- Flexible structure. Documents do not rely on predefined patterns. We know that in relational databases, fields must be defined in advance before they can be used. In elastic search, fields are very flexible. Sometimes, we can ignore the field or add a new field dynamically.
Although we can add or ignore a field at will, the type of each field is very important. For example, an age field type can be a string or an integer. Because elasticsearch will save the mapping between fields and types and other settings. This mapping is specific to each type of each mapping, which is why types are sometimes referred to as mapping types in elastic search.
type
type
A type is a logical container for documents. Just like a relational database, a table is a container for rows. The definition of a field in a type is called mapping. For example, name is mapped to a string type. We say that documents are modeless. They don't need to have all the fields defined in the mapping, such as adding a new field. What does elasticsearch do? Elasticsearch will automatically add a new field to the mapping, but if the field is not sure what type it is, elasticsearch will start to guess. If the value is 18, elasticsearch will think it is an integer. The best way to define the security relationship in advance is to guess the best way to use the search field in advance.
Indexes
Indexes
It's the database!
The index is a container of mapping type. The index in elastic search is a very large collection of documents. The index stores the fields and other settings of the mapping type. Then they are stored on each slice. Let's study how the next slice works.
Physical design: how nodes and shards work
A cluster has at least one node, and a node is an elasricsearch process. A node can have multiple indexes. By default, if you create an index, the index will be composed of five primary shards, and each primary shard will have a replica shard
The above figure shows a cluster with three nodes. It can be seen that the primary partition and the corresponding replication partition will not be in the same node, which is conducive to the death of a node and the loss of data. In fact, a fragment is a Lucene index, a file directory containing inverted index. The inverted index structure enables elastic search to tell you which documents contain specific keywords without scanning all documents. But wait, what the hell is an inverted index?
Inverted index
Inverted index
elasticsearch uses a structure called inverted index | and uses Lucene inverted cable as the bottom layer. This structure is suitable for fast full-text search, an index by text
All non repeating lists in the file are composed. For each word, there is a document list containing it. For example, there are now two documents, each containing the following:
Study every day, good good up to forever # Contents of document 1 To forever, study every day,good good up # Content contained in document 2
To create an inverted index, we first split each document into independent words (or terms or tokens), then create a sorted list containing all non duplicate terms, and then list the document in which each term appears:
term | doc_1 | doc_2 |
---|---|---|
Study | √ | x |
To | x | x |
every | √ | √ |
forever | √ | √ |
day | √ | √ |
study | x | √ |
good | √ | √ |
every | √ | √ |
to | √ | x |
up | √ | √ |
Now, we're trying to search to forever, just look at the document containing each entry
term | doc_1 | doc_2 |
---|---|---|
to | √ | x |
forever | √ | √ |
total | 2 | 1 |
Both documents match, but the first document matches more than the second. If there are no other conditions, now both documents containing keywords will be returned.
Let's take another example. For example, we search blog posts through blog tags. Then the inverted index list is such a structure:
Blog posts (raw data) | Blog posts (raw data) | Index list (inverted index) | Index list (inverted index) |
---|---|---|---|
Blog post ID | label | label | Blog post ID |
1 | python | python | 1,2,3 |
2 | python | linux | 3,4 |
3 | linux,python | ||
4 | linux |
If you want to search for articles with python tags, it will be much faster to find the inverted index data than to find all the original data. Just check the tag column and get the relevant article ID. Completely filter out all irrelevant data and improve efficiency!
Comparison between elasticsearch index and Lucene index
In elastic search, the term index (Library) is frequently used, which is the use of the term. In elastic search, the index is divided into multiple slices, and each slice is an index of Lucene. Therefore, an elastic search index is composed of multiple Lucene indexes. Don't ask why, who let elasticsearch use Lucene as the bottom layer! Unless otherwise specified, the index refers to the index of elasticsearch.
All the next operations are completed in the Console under Dev Tools in kibana. Basic operation!
ik participle
What is an IK word breaker?
Word segmentation: that is to divide a paragraph of Chinese or others into keywords. When searching, we will segment our own information, segment the data in the database or index library, and then perform a matching operation. The default Chinese word segmentation is to treat each word as a word. For example, "I love crazy God" will be divided into "I", "love", "Crazy" and "God", which obviously does not meet the requirements, So we need to install Chinese word splitter ik to solve this problem.
If you want to use Chinese, it is recommended to use ik word splitter!
IK provides two word segmentation algorithms: ik_ smart and ik_ max_ word, where ik_ smart is the least segmentation, ik_ max_ _word is the most fine-grained division! We'll test it later!
What is an IK word breaker:
- Participle a sentence
- If using Chinese: IK word splitter is recommended
- Two word segmentation algorithms: ik_smart (minimum segmentation), ik_max_word (most fine-grained Division)
[ik_smart] test:
GET _analyze { "analyzer": "ik_smart", "text": "I am the successor of socialism" } //output { "tokens" : [ { "token" : "I", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "yes", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "socialist", "start_offset" : 2, "end_offset" : 6, "type" : "CN_WORD", "position" : 2 }, { "token" : "successor", "start_offset" : 6, "end_offset" : 9, "type" : "CN_WORD", "position" : 3 } ] }
[ik_max_word] test:
GET _analyze { "analyzer": "ik_max_word", "text": "I am the successor of socialism" } //output { "tokens" : [ { "token" : "I", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "yes", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "socialist", "start_offset" : 2, "end_offset" : 6, "type" : "CN_WORD", "position" : 2 }, { "token" : "Sociology", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 3 }, { "token" : "doctrine", "start_offset" : 4, "end_offset" : 6, "type" : "CN_WORD", "position" : 4 }, { "token" : "successor", "start_offset" : 6, "end_offset" : 9, "type" : "CN_WORD", "position" : 5 }, { "token" : "Succession", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 6 }, { "token" : "people", "start_offset" : 8, "end_offset" : 9, "type" : "CN_CHAR", "position" : 7 } ] }
3. Use of command mode
3.1 Rest style description
A software architecture style, not a standard. It is easier to implement mechanisms such as caching
method | url address | describe |
---|---|---|
PUT | localhost:9200 / index name / type name / document id | Create document (specify document id) |
POST | localhost:9200 / index name / type name | Create document (random document id) |
POST | localhost:9200 / index name / type name / document id/_update | Modify document |
DELETE | localhost:9200 / index name / type name / document id | remove document |
GET | localhost:9200 / index name / type name / document id | Query document by document id |
POST | localhost:9200 / index name / type name/_ search | Query all data |
Basic test
1. Create an index
PUT / index name / type name (not written in the higher version, but _doc) / document id
{request body}
Finished automatically adding index! The data was also successfully added.
Do you need to specify the type for the name field
Specify the type properties of the field, such as sql table creation
GET this rule! Specific information can be obtained through GET request
If you do not set the document field type yourself, es will automatically set the default type
3.2 cat command
Get health value
Get all the information
GET _cat/indices?v
There are many more that can be displayed automatically. Try them all
Modify index
1. To modify, we can still use the original PUT command and modify it according to the id
However, if the fields are not filled in, they will be reset to empty, which is equivalent to the object modification transmitted by the java interface. If only some fields of id are transmitted, other values that are not transmitted will be empty.
2. There is also an update method, which does not set some values and data will not be lost
POST /test3/_doc/1/_update { "doc":{ "name":"212121" } } //The following two methods will clear the unmodified value POST /test3/_doc/1 { "name":"212121" } POST /test3/_doc/1 { "doc":{ "name":"212121" } }
Modifying a query with document is also a query with document
Delete index
Operations on deleting indexes or documents
DELETE through the DELETE command. Judge whether to DELETE the index or DELETE the document record according to your request
RESTFUL style is recommended by ES!
3.3 basic operation of documents
query
The simplest search is GET
Search function search
The name here is text, so the word segmentation query is made. If it is keyword, the word segmentation search will not be carried out
Complex operation search select (sorting, paging, highlighting, fuzzy query, accurate query)
//The test can only query one field GET lisen/user/_search { "query": { "match": { "name": "Leeson " } } }
Result filtering is to display only some fields in the list
contain
Not included
sort
paging
code
GET lisen/user/_search { "query": { "match": { "name": "Leeson " } }, "sort":{ "age":{ "order":"asc" } }, "from": 0, "size": 1 }
Multi condition query
Boolean query
Must (and), all conditions must be met
should (or) is the same as the database
must_not(not)
Conditional interval
- gt greater than
- gte is greater than or equal to
- lte less than
- lte less than or equal to
Match condition array (multiple)
match doesn't use inverted index. Correct it here
Exact search
Term query is directly through the term process specified by the inverted index
About participle
- term, direct query, accurate
- match, can use the word splitter to parse! (first analyze the document, and then query through the analyzed document)
The default is participle
keyword is not segmented
Accurately query multiple values
Highlight
You can also customize the highlighted style
4. springboot integration
4.1 importing dependent packages
Create a springboot project and check the package of springboot web and the package of elastic search of Nosql
If not, manually import
<!--es client--> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.6.2</version> </dependency> <!--springboot of elasticsearch service--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency>
Note whether the dependent es version in the parent package of spring boot is your corresponding version
If not, write a version of properties under the pom file
<!--Configure the corresponding version here--> <properties> <java.version>1.8</java.version> <elasticsearch.version>7.6.2</elasticsearch.version> </properties>
4.2 inject RestHighLevelClient client
@Configuration public class ElasticSearchClientConfig { @Bean public RestHighLevelClient restHighLevelClient(){ RestHighLevelClient client = new RestHighLevelClient( RestClient.builder(new HttpHost("127.0.0.1",9200,"http")) ); return client; } }
4.3 whether the addition, deletion and of the index exist
//Test index creation @Test void testCreateIndex() throws IOException { //1. Request to create index CreateIndexRequest request = new CreateIndexRequest("lisen_index"); //2. The client executes the request and obtains the response after the request CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT); System.out.println(response); } //Test whether the index exists @Test void testExistIndex() throws IOException { //1. Request to create index GetIndexRequest request = new GetIndexRequest("lisen_index"); //2. The client executes the request and obtains the response after the request boolean exist = client.indices().exists(request, RequestOptions.DEFAULT); System.out.println("Test whether the index exists-----"+exist); } //Delete index @Test void testDeleteIndex() throws IOException { DeleteIndexRequest request = new DeleteIndexRequest("lisen_index"); AcknowledgedResponse delete = client.indices().delete(request,RequestOptions.DEFAULT); System.out.println("Delete index--------"+delete.isAcknowledged()); }
4.4 document operation
//Test add document @Test void testAddDocument() throws IOException { User user = new User("lisen",27); IndexRequest request = new IndexRequest("lisen_index"); request.id("1"); //Set timeout request.timeout("1s"); //Put data into json string request.source(JSON.toJSONString(user), XContentType.JSON); //Send request IndexResponse response = client.index(request,RequestOptions.DEFAULT); System.out.println("Add document-------"+response.toString()); System.out.println("Add document-------"+response.status()); // result // Add document -------- indexresponse [index = lisen_index, type = _doc, id = 1, version = 1, result = created, seqno = 0, primaryterm = 1, shares = {"total": 2, "successful": 1, "failed": 0}] // Add document --------- CREATED } //Test whether the document exists @Test void testExistDocument() throws IOException { //There is no index in the test document GetRequest request= new GetRequest("lisen_index","1"); //There are no indices() boolean exist = client.exists(request, RequestOptions.DEFAULT); System.out.println("Test whether the document exists-----"+exist); } //Test acquisition document @Test void testGetDocument() throws IOException { GetRequest request= new GetRequest("lisen_index","1"); GetResponse response = client.get(request, RequestOptions.DEFAULT); System.out.println("Test acquisition document-----"+response.getSourceAsString()); System.out.println("Test acquisition document-----"+response); // result // Test acquisition document ----- {"age":27,"name":"lisen"} // Test and obtain documents - {"_index": "lisen_index", "U type": "_doc", "u id": "1", "U version": 1, "_seq_no": 0, "_primary_term": 1, "found": true, "_source": {"age": 27, "name": "lisen"}} } //Test modification document @Test void testUpdateDocument() throws IOException { User user = new User("Li Xiaoyao", 55); //The modification id is 1 UpdateRequest request= new UpdateRequest("lisen_index","1"); request.timeout("1s"); request.doc(JSON.toJSONString(user),XContentType.JSON); UpdateResponse response = client.update(request, RequestOptions.DEFAULT); System.out.println("Test modification document-----"+response); System.out.println("Test modification document-----"+response.status()); // result // Test modification document ----- UpdateResponse[index=lisen_index,type=_doc,id=1,version=2,seqNo=1,primaryTerm=1,result=updated,shards=ShardInfo{total=2, successful=1, failures = []}] // Test modification document ----- OK // Deleted // Test get document ----- null // Test and obtain documents - {"_index": "lisen_index", "_type": "_doc", "u id": "1", "found": false} } //Test delete document @Test void testDeleteDocument() throws IOException { DeleteRequest request= new DeleteRequest("lisen_index","1"); request.timeout("1s"); DeleteResponse response = client.delete(request, RequestOptions.DEFAULT); System.out.println("Test delete document------"+response.status()); } //Test batch add documents @Test void testBulkAddDocument() throws IOException { ArrayList<User> userlist=new ArrayList<User>(); userlist.add(new User("cyx1",5)); userlist.add(new User("cyx2",6)); userlist.add(new User("cyx3",40)); userlist.add(new User("cyx4",25)); userlist.add(new User("cyx5",15)); userlist.add(new User("cyx6",35)); //Request for batch operation BulkRequest request = new BulkRequest(); request.timeout("1s"); //Batch processing request for (int i = 0; i < userlist.size(); i++) { request.add( new IndexRequest("lisen_index") .id(""+(i+1)) .source(JSON.toJSONString(userlist.get(i)),XContentType.JSON) ); } BulkResponse response = client.bulk(request, RequestOptions.DEFAULT); //response. Is hasfailures() a failure System.out.println("Test batch add documents-----"+response.hasFailures()); // Result: false means success and true means failure // Test batch add documents ----- false } //Test query document @Test void testSearchDocument() throws IOException { SearchRequest request = new SearchRequest("lisen_index"); //Build search criteria SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //Highlight is set sourceBuilder.highlighter(); //With term name cyx1 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "cyx1"); sourceBuilder.query(termQueryBuilder); sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); request.source(sourceBuilder); SearchResponse response = client.search(request, RequestOptions.DEFAULT); System.out.println("Test query document-----"+JSON.toJSONString(response.getHits())); System.out.println("====================="); for (SearchHit documentFields : response.getHits().getHits()) { System.out.println("Test document query--Traversal parameters--"+documentFields.getSourceAsMap()); } // Test query document ----- {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{"no": - 2, "sortvalues": [{{{{{{{{{\{\fieldfield field field \\{where where where field field field field field: where where it's going to be where where it's a {{{{{{{{\\: 1.8413742, "totalHits":{"relation":"EQUAL_TO","value":1}} // ===================== // Test query document -- traversal parameters -- {name=cyx1, age=5} }
Reference website: https://blog.csdn.net/mgdj25/article/details/105740191
Reference video: https://www.bilibili.com/video/BV17a4y1x7zq