elasticsearch simple usage summary

elasticsearch simple usage summary

1. Preparation for ELK installation

1.1 ELK download address

ElasticSearch: https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D

logstash: https://mirrors.huaweicloud.com/logstash/?C=N&O=D

The visual interface elasticsearch head https://github.com/mobz/elasticsearch-head

kibana: https://mirrors.huaweicloud.com/kibana/?C=N&O=D

ik participle https://github.com/medcl/elasticsearch-analysis-ik

jdk must be version 1.8 or above

1.2 macos installation jdk and mvn

Configuring jdk under mac

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
PATH=$JAVA_HOME/bin:$PATH:.
CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:.
export JAVA_HOME
export PATH
export CLASSPATH

Install jdk1 8 and configure environment variables

org.elasticsearch.ElasticsearchException: X-Pack is not supported and Machine Learning is not available for [windows-x86]; you can use the other X-Pack features (unsupported) by setting xpack.ml.enabled: false in elasticsearch.yml

Add in the yml file of es

cluster.initial_master_nodes: ["node-1"]  # node-1 here is the value configured for node name
xpack.ml.enabled: false
# If the connection fails, cross domain problems (cross port, Cross website, etc.) are found. Modify the configuration and set cross domain
http.cors.enabled: true
http.cors.allow-origin: "*" 

Open profile

vi ~/.bash_profile

maven path is downloaded with idea

export M2_HOME=/Users/lisen/apache-maven-3.6.3
export PATH=$PATH:$M2_HOME/bin

Refresh resources

source ~/.bash_profile 

Test whether maven is installed successfully

mvn -v

There are many installations of windows, so I won't talk about it here...

1.3 elk installation reference

https://blog.csdn.net/mgdj25/article/details/105740191

Here's the basic thing to do

  1. The internationalization setting of kibana is set to zh CN in yml

  2. Add yml file under es

    http.cors.enabled: true
    http.cors.allow-origin: "*" 
    
  3. After the word splitter is downloaded, modify the pom file in it, modify the version of the corresponding es, put it in the ik directory of ES plugins (create one), put it in it, and enter it into the ik directory to pass

    mvn clean
    mvn compile
    mvn package
    

    To download the jar package, compile and package the files with config and jar package under releases under the target file, copy them and put them under ik, and delete other files (key points)

  4. Here's a special reminder about the granularity setting of word segmentation. For example, a word segmentation just splits the monomer of a word, "Li Kui ha ha", which may just separate Li Kui and not be divided into the word "Li Kui", so you have to write your own dictionary

    Add a new "cyx.dic" in the config directory under ik participle

    Add your own dic to the configuration ikanalyzer cfg. In XML

    Add one (there are notes for many usages)

    <!-- Users can expand their own user dictionary here-->  
    <entry key="ext_dict">cyx.dic</entry>
    

2. ES core concept

What are clusters, nodes, indexes, types, documents, shards, and mappings?

Elasticsearch is an objective comparison between document oriented, relational database and elasticsearch! Everything is json

Relational DB Elasticsearch
database Indexes
tables types
rows documents
Fields (columns) fields

Physical design:

Elastic search divides each index into multiple slices in the background. Each fragment can be migrated between different servers in the cluster

Logic design:

There are multiple documents in an index type. When we index a document, we can find it in this order: index - "type -" document ID. through this combination, we can index a specific document. Note: the ID does not have to be an integer, it is actually a string.

file

file

It's our records one by one

Previously, elasticsearch is document oriented, which means that the smallest unit of cable bow and search data is the document. In elasticsearch, the document has several important attributes:

  • Self contained, - documents contain both fields and corresponding values, that is, key:value!
  • It can be hierarchical, - a document contains itself. That's how complex logical entities come from! {it's a json object! fastjson automatically converts!}
  • Flexible structure. Documents do not rely on predefined patterns. We know that in relational databases, fields must be defined in advance before they can be used. In elastic search, fields are very flexible. Sometimes, we can ignore the field or add a new field dynamically.

Although we can add or ignore a field at will, the type of each field is very important. For example, an age field type can be a string or an integer. Because elasticsearch will save the mapping between fields and types and other settings. This mapping is specific to each type of each mapping, which is why types are sometimes referred to as mapping types in elastic search.

type

type

A type is a logical container for documents. Just like a relational database, a table is a container for rows. The definition of a field in a type is called mapping. For example, name is mapped to a string type. We say that documents are modeless. They don't need to have all the fields defined in the mapping, such as adding a new field. What does elasticsearch do? Elasticsearch will automatically add a new field to the mapping, but if the field is not sure what type it is, elasticsearch will start to guess. If the value is 18, elasticsearch will think it is an integer. The best way to define the security relationship in advance is to guess the best way to use the search field in advance.

Indexes

Indexes

It's the database!

The index is a container of mapping type. The index in elastic search is a very large collection of documents. The index stores the fields and other settings of the mapping type. Then they are stored on each slice. Let's study how the next slice works.

Physical design: how nodes and shards work

A cluster has at least one node, and a node is an elasricsearch process. A node can have multiple indexes. By default, if you create an index, the index will be composed of five primary shards, and each primary shard will have a replica shard

The above figure shows a cluster with three nodes. It can be seen that the primary partition and the corresponding replication partition will not be in the same node, which is conducive to the death of a node and the loss of data. In fact, a fragment is a Lucene index, a file directory containing inverted index. The inverted index structure enables elastic search to tell you which documents contain specific keywords without scanning all documents. But wait, what the hell is an inverted index?

Inverted index

Inverted index

elasticsearch uses a structure called inverted index | and uses Lucene inverted cable as the bottom layer. This structure is suitable for fast full-text search, an index by text
All non repeating lists in the file are composed. For each word, there is a document list containing it. For example, there are now two documents, each containing the following:

Study every day, good good up to forever  # Contents of document 1
To forever, study every day,good good up  # Content contained in document 2

To create an inverted index, we first split each document into independent words (or terms or tokens), then create a sorted list containing all non duplicate terms, and then list the document in which each term appears:

term doc_1 doc_2
Study x
To x x
every
forever
day
study x
good
every
to x
up

Now, we're trying to search to forever, just look at the document containing each entry

term doc_1 doc_2
to x
forever
total 2 1

Both documents match, but the first document matches more than the second. If there are no other conditions, now both documents containing keywords will be returned.
Let's take another example. For example, we search blog posts through blog tags. Then the inverted index list is such a structure:

Blog posts (raw data) Blog posts (raw data) Index list (inverted index) Index list (inverted index)
Blog post ID label label Blog post ID
1 python python 1,2,3
2 python linux 3,4
3 linux,python
4 linux

If you want to search for articles with python tags, it will be much faster to find the inverted index data than to find all the original data. Just check the tag column and get the relevant article ID. Completely filter out all irrelevant data and improve efficiency!

Comparison between elasticsearch index and Lucene index

In elastic search, the term index (Library) is frequently used, which is the use of the term. In elastic search, the index is divided into multiple slices, and each slice is an index of Lucene. Therefore, an elastic search index is composed of multiple Lucene indexes. Don't ask why, who let elasticsearch use Lucene as the bottom layer! Unless otherwise specified, the index refers to the index of elasticsearch.

All the next operations are completed in the Console under Dev Tools in kibana. Basic operation!

ik participle

What is an IK word breaker?

Word segmentation: that is to divide a paragraph of Chinese or others into keywords. When searching, we will segment our own information, segment the data in the database or index library, and then perform a matching operation. The default Chinese word segmentation is to treat each word as a word. For example, "I love crazy God" will be divided into "I", "love", "Crazy" and "God", which obviously does not meet the requirements, So we need to install Chinese word splitter ik to solve this problem.

If you want to use Chinese, it is recommended to use ik word splitter!

IK provides two word segmentation algorithms: ik_ smart and ik_ max_ word, where ik_ smart is the least segmentation, ik_ max_ _word is the most fine-grained division! We'll test it later!

What is an IK word breaker:

  • Participle a sentence
  • If using Chinese: IK word splitter is recommended
  • Two word segmentation algorithms: ik_smart (minimum segmentation), ik_max_word (most fine-grained Division)

[ik_smart] test:

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "I am the successor of socialism"
}

//output
{
  "tokens" : [
    {
      "token" : "I",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "yes",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "socialist",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "successor",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

[ik_max_word] test:

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "I am the successor of socialism"
}
//output
{
  "tokens" : [
    {
      "token" : "I",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "yes",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "socialist",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "Sociology",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "doctrine",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "successor",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "Succession",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "people",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

3. Use of command mode

3.1 Rest style description

A software architecture style, not a standard. It is easier to implement mechanisms such as caching

method url address describe
PUT localhost:9200 / index name / type name / document id Create document (specify document id)
POST localhost:9200 / index name / type name Create document (random document id)
POST localhost:9200 / index name / type name / document id/_update Modify document
DELETE localhost:9200 / index name / type name / document id remove document
GET localhost:9200 / index name / type name / document id Query document by document id
POST localhost:9200 / index name / type name/_ search Query all data

Basic test

1. Create an index

PUT / index name / type name (not written in the higher version, but _doc) / document id

{request body}

Finished automatically adding index! The data was also successfully added.

Do you need to specify the type for the name field

Specify the type properties of the field, such as sql table creation

GET this rule! Specific information can be obtained through GET request

If you do not set the document field type yourself, es will automatically set the default type

3.2 cat command

Get health value

Get all the information

GET _cat/indices?v

There are many more that can be displayed automatically. Try them all

Modify index

1. To modify, we can still use the original PUT command and modify it according to the id

However, if the fields are not filled in, they will be reset to empty, which is equivalent to the object modification transmitted by the java interface. If only some fields of id are transmitted, other values that are not transmitted will be empty.

2. There is also an update method, which does not set some values and data will not be lost

POST /test3/_doc/1/_update
{
  "doc":{
    "name":"212121"
  }
}

//The following two methods will clear the unmodified value

POST /test3/_doc/1
{
    "name":"212121"
}

POST /test3/_doc/1
{
  "doc":{
    "name":"212121"
  }
}

Modifying a query with document is also a query with document

Delete index

Operations on deleting indexes or documents

DELETE through the DELETE command. Judge whether to DELETE the index or DELETE the document record according to your request

RESTFUL style is recommended by ES!

3.3 basic operation of documents

query

The simplest search is GET

Search function search

The name here is text, so the word segmentation query is made. If it is keyword, the word segmentation search will not be carried out

Complex operation search select (sorting, paging, highlighting, fuzzy query, accurate query)

//The test can only query one field
GET lisen/user/_search
{
  "query": {
    "match": {
      "name": "Leeson "
    }
  }
}

Result filtering is to display only some fields in the list

contain

Not included

sort

paging

code

GET lisen/user/_search
{
  "query": {
    "match": {
      "name": "Leeson "
    }
  },
  "sort":{
    "age":{
      "order":"asc"
    }
  },
  "from": 0,
  "size": 1
}

Multi condition query

Boolean query

Must (and), all conditions must be met

should (or) is the same as the database

must_not(not)

Conditional interval

  • gt greater than
  • gte is greater than or equal to
  • lte less than
  • lte less than or equal to

Match condition array (multiple)

match doesn't use inverted index. Correct it here

Exact search

Term query is directly through the term process specified by the inverted index

About participle

  • term, direct query, accurate
  • match, can use the word splitter to parse! (first analyze the document, and then query through the analyzed document)

The default is participle

keyword is not segmented

Accurately query multiple values

Highlight

You can also customize the highlighted style

4. springboot integration

4.1 importing dependent packages

Create a springboot project and check the package of springboot web and the package of elastic search of Nosql

If not, manually import

<!--es client-->
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.6.2</version>
</dependency>

<!--springboot of elasticsearch service-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

Note whether the dependent es version in the parent package of spring boot is your corresponding version

If not, write a version of properties under the pom file

<!--Configure the corresponding version here-->
<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.6.2</elasticsearch.version>
</properties>

4.2 inject RestHighLevelClient client

@Configuration
public class ElasticSearchClientConfig {
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))
        );
        return client;
    }
}

4.3 whether the addition, deletion and of the index exist

//Test index creation
@Test
void testCreateIndex() throws IOException {
    //1. Request to create index
    CreateIndexRequest request = new CreateIndexRequest("lisen_index");
    //2. The client executes the request and obtains the response after the request
    CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(response);
}

//Test whether the index exists
@Test
void testExistIndex() throws IOException {
    //1. Request to create index
    GetIndexRequest request = new GetIndexRequest("lisen_index");
    //2. The client executes the request and obtains the response after the request
    boolean exist =  client.indices().exists(request, RequestOptions.DEFAULT);
    System.out.println("Test whether the index exists-----"+exist);
}

//Delete index
@Test
void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("lisen_index");
    AcknowledgedResponse delete = client.indices().delete(request,RequestOptions.DEFAULT);
    System.out.println("Delete index--------"+delete.isAcknowledged());
}

4.4 document operation

//Test add document
    @Test
    void testAddDocument() throws IOException {
        User user = new User("lisen",27);
        IndexRequest request = new IndexRequest("lisen_index");
        request.id("1");
        //Set timeout
        request.timeout("1s");
        //Put data into json string
        request.source(JSON.toJSONString(user), XContentType.JSON);
        //Send request
        IndexResponse response = client.index(request,RequestOptions.DEFAULT);
        System.out.println("Add document-------"+response.toString());
        System.out.println("Add document-------"+response.status());
//        result
//        Add document -------- indexresponse [index = lisen_index, type = _doc, id = 1, version = 1, result = created, seqno = 0, primaryterm = 1, shares = {"total": 2, "successful": 1, "failed": 0}]
//        Add document --------- CREATED
    }

    //Test whether the document exists
    @Test
    void testExistDocument() throws IOException {
        //There is no index in the test document
        GetRequest request= new GetRequest("lisen_index","1");
        //There are no indices()
        boolean exist = client.exists(request, RequestOptions.DEFAULT);
        System.out.println("Test whether the document exists-----"+exist);
    }

    //Test acquisition document
    @Test
    void testGetDocument() throws IOException {
        GetRequest request= new GetRequest("lisen_index","1");
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        System.out.println("Test acquisition document-----"+response.getSourceAsString());
        System.out.println("Test acquisition document-----"+response);

//        result
//        Test acquisition document ----- {"age":27,"name":"lisen"}
//        Test and obtain documents - {"_index": "lisen_index", "U type": "_doc", "u id": "1", "U version": 1, "_seq_no": 0, "_primary_term": 1, "found": true, "_source": {"age": 27, "name": "lisen"}}

    }

    //Test modification document
    @Test
    void testUpdateDocument() throws IOException {
        User user = new User("Li Xiaoyao", 55);
        //The modification id is 1
        UpdateRequest request= new UpdateRequest("lisen_index","1");
        request.timeout("1s");
        request.doc(JSON.toJSONString(user),XContentType.JSON);

        UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
        System.out.println("Test modification document-----"+response);
        System.out.println("Test modification document-----"+response.status());

//        result
//        Test modification document ----- UpdateResponse[index=lisen_index,type=_doc,id=1,version=2,seqNo=1,primaryTerm=1,result=updated,shards=ShardInfo{total=2, successful=1, failures = []}]
//        Test modification document ----- OK

//        Deleted
//        Test get document ----- null
//        Test and obtain documents - {"_index": "lisen_index", "_type": "_doc", "u id": "1", "found": false}
    }


    //Test delete document
    @Test
    void testDeleteDocument() throws IOException {
        DeleteRequest request= new DeleteRequest("lisen_index","1");
        request.timeout("1s");
        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        System.out.println("Test delete document------"+response.status());
    }

    //Test batch add documents
    @Test
    void testBulkAddDocument() throws IOException {
        ArrayList<User> userlist=new ArrayList<User>();
        userlist.add(new User("cyx1",5));
        userlist.add(new User("cyx2",6));
        userlist.add(new User("cyx3",40));
        userlist.add(new User("cyx4",25));
        userlist.add(new User("cyx5",15));
        userlist.add(new User("cyx6",35));

        //Request for batch operation
        BulkRequest request = new BulkRequest();
        request.timeout("1s");

        //Batch processing request
        for (int i = 0; i < userlist.size(); i++) {
            request.add(
                    new IndexRequest("lisen_index")
                            .id(""+(i+1))
                            .source(JSON.toJSONString(userlist.get(i)),XContentType.JSON)
            );
        }
        BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
        //response. Is hasfailures() a failure
        System.out.println("Test batch add documents-----"+response.hasFailures());

//        Result: false means success and true means failure
//        Test batch add documents ----- false
    }


    //Test query document
    @Test
    void testSearchDocument() throws IOException {
        SearchRequest request = new SearchRequest("lisen_index");
        //Build search criteria
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //Highlight is set
        sourceBuilder.highlighter();
        //With term name cyx1
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "cyx1");
        sourceBuilder.query(termQueryBuilder);
        sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        request.source(sourceBuilder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        System.out.println("Test query document-----"+JSON.toJSONString(response.getHits()));
        System.out.println("=====================");
        for (SearchHit documentFields : response.getHits().getHits()) {
            System.out.println("Test document query--Traversal parameters--"+documentFields.getSourceAsMap());
        }

//        Test query document ----- {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{"no": - 2, "sortvalues": [{{{{{{{{{\{\fieldfield field field \\{where where where field field field field field: where where it's going to be where where it's a {{{{{{{{\\: 1.8413742, "totalHits":{"relation":"EQUAL_TO","value":1}}
//        =====================
//        Test query document -- traversal parameters -- {name=cyx1, age=5}
    }

Reference website: https://blog.csdn.net/mgdj25/article/details/105740191

Reference video: https://www.bilibili.com/video/BV17a4y1x7zq

Tags: Java ElasticSearch

Posted by anwoke8204 on Thu, 19 May 2022 14:41:04 +0300