[introduction to ElasticSearch] 2. Basic concepts of ElasticSearch and crud api

[introduction to ElasticSearch] 2. Basic concepts of ElasticSearch and crud api

1, Documentation

1.1 general

Elastic search is document oriented, and document is the smallest unit of all search data.

A document is like a long storage record in a database. Document is the basic unit of index information.

Each document has a uniqueID whose value is not indexed

1.2 document metadata

Metadata is used to mark the relevant information of the document. The metadata of the index document is as follows:

  • _ Index the name of the index to which the document belongs
  • _ Name of the type to which the type document belongs
  • _ ID document unique ID
  • _ Document relevance score
  • _ source document JSON data
  • _ Version document version information

Among them_ The type name of the type document. You need to pay attention to the differences between different versions:

  • Before 7.0, multiple types could be set in one index
  • Since 7.0, it has been Deprecated. An index can only create one type with a value of_ doc

1.3 index

As a noun, the index represents that many different indexes can be created in the elastic search cluster.

As a verb, the index represents saving a document to elastic search. It means to create an inverted index in elastic search

An index is a collection of documents of similar types.

ES index is the container of documents and a collection of documents.

Execute in devtools: GET movies

{
  "movies" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "genre" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "year" : {
          "type" : "long"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1589097356250",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "S0Ic8iLJTjKfL4wd-wGckA",
        "version" : {
          "created" : "7020099"
        },
        "provided_name" : "movies"
      }
    }
  }
}

According to the returned results, we know:

  • mappings: defines the type of document field
  • settings: defines the configuration information of the index
  • aliases: defines the alias of the index. You can access the index through the alias

Index is a concept of logical space. Each index has a Mapping definition of that, which corresponds to the field name and field type of the document. In contrast to the fragmentation that will be discussed later, it is a physical space concept, and the data stored in the index will be scattered on the fragmentation.

1.4 simple comparison with mysql

1.5 restApi can be called in multiple languages

2, Distributed

elasticsearch In fact, it is a distributed system, which needs to meet the high availability and scalability of the distributed system

2.1 availability and scalability of distributed system

  • High availability
    • Service availability - allow nodes to stop service
    • Data availability - some nodes are lost and data will not be lost
  • Scalability
    • Increasing requests / growing data (distributing data to all nodes)

2.2 distributed features

  • Distributed architecture benefits of elasticsearch

    • Horizontal expansion of storage
    • Improve the availability of the system, stop the service of some sections, and the service of the whole cluster will not be affected

3, Node

3.1 nodes

  • Is an instance of elasticsearch

    • It is essentially a java process

    • Multiple elasticsearch processes can run on one machine, but production environments generally recommend running one elasticsearch instance on one machine

  • Each node has a name, which is configured through the configuration file or - e node at startup Name = node1 specify

  • After each node is started, it will allocate a UID and save it in the data directory

3.2 master eligible nodes and master nodes

  • After each node is started, it is a Master eligible node by default

    • You can set node Master: false forbidden
  • The master eligible node can participate in the main process and become the master node

  • When the first node starts, it will elect itself as the Master node

  • The status of the cluster is saved on each node. Only the master node can modify the status information of the cluster

    • Cluster state maintains the necessary information in a cluster
      • All node information
      • All indexes and their related Mapping and Setting information
      • Fragmented routing information
    • Any node can modify information, which will lead to data inconsistency

3.3 Data Node & Coordinate Node

  • Data Node

    • The node that can save data is called Data Node, which is responsible for saving fragment data. It plays a vital role in data expansion
  • Coordinate Node

    • Accept the Client's request, distribute the request to the appropriate node, and finally gather the results together
    • Each node plays the role of Coordinate Node by default
  • Hot & Warm Node

    • Data nodes with different hardware configurations (low cold node configuration) are used to implement the hot & warm architecture and reduce the cost of cluster deployment
  • Machine Learning Node

    • The Job responsible for running machine learning is used for anomaly detection
  • Tribe Node

    • Use cluster 5 to connect to different clusters, and use cluster 5 to connect to these clusters separately

3.4 configuration node type

  • A node in the development environment can assume multiple roles
  • Single role configuration in production environment: better performance, single role

4, Shard (primary shard & replica shard)

4.1 slice overview

  • The main partition is used to solve the problem of data horizontal expansion. Through master sharding, data can be distributed to all nodes in the cluster

    • A fragment is an instance of a running Lucene
    • The number of primary partitions is specified when the index is created. Subsequent modifications are not allowed unless through Reindex
  • Replica is used to solve the problem of high availability of data. Sharding is the copy of the main shard

    • The number of replica fragments can be adjusted dynamically
    • Increasing the number of replicas can also improve the service availability (read throughput) to a certain extent

An example:

4.2 slice setting

For the setting of generation environment fragmentation, it is necessary to make capacity planning in advance (this is very important)

  • The number of slices is set too small

    • As a result, subsequent nodes cannot be added to realize horizontal expansion
    • The amount of data in a single slice is too large, resulting in time-consuming data redistribution
  • The number of slices is set too large. Starting from 7.0, the default main slice is set to 1, which solves the problem of over sharding

    • Affect the relevance scoring of search results and the accuracy of statistical results
    • Excessive fragmentation on a single node will lead to a waste of resources and affect performance

5, View the health of the cluster

Run dev tools again: get_ cluster/health

{
  "cluster_name" : "geektime",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 5,
  "active_shards" : 10,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Distinguish according to the corresponding colors of different states.

  • Green - both master tiles and replicas are allocated normally
  • Yellow - all primary partitions are allocated normally, and replica partitions are not allocated normally
  • Red - a master partition failed to allocate

GET _cat/nodes

172.24.0.4 37 96 7 0.53 0.26 0.21 mdi * es72_02
172.24.0.5 33 96 7 0.53 0.26 0.21 mdi - es72_01

Use cerebro to access the local 9000 port.

02 is the master node

6, crud api

6.1 foundation crud

create supports two methods: automatically generating document Id and specifying document Id

#create document.  Automatic generation_ id
#Request header index name/_ doc/id
POST users/_doc
{
	"user" : "Mike",
    "post_date" : "2019-04-15T14:12:12",
    "message" : "trying out Kibana"
}

#create document.  Specify id manually
PUT users/_doc/1?op_type=create 
{
  "user" : "Mike",
  "post_date" : "2019-04-15T14:12:12",
  "message" : "trying out Kibana"
}
# query
GET users/_doc/1

# Overwrite modification, if not, create
PUT users/_doc/1
{
  "user":"Mike"
}

#Update on the original basis
POST users/_update/1
{
  "doc":{
      "post_date" : "2019-04-15T14:12:12",
      "message" : "trying out Kibana"
  }
}
#delete
DELETE users/_doc/1

6.2 batch operation

Use_ bulk keyword

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }  //Create a doc with id 1 and index test
{ "field1" : "value1" } //Insert field1 as value1
{ "delete" : { "_index" : "test", "_id" : "2" } } //Attempt to delete doc with id 2
{ "create" : { "_index" : "test2", "_id" : "3" } } //Create a doc with id 3 and index test2
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} } //to update
{ "doc" : {"field2" : "value2"} }

mget keyword used in batch acquisition

GET /test/_mget //Specify index in url
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}

GET /_mget //Specify index in json
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_id" : "2"
        }
    ]
}

Batch Search trial msearch

### msearch operation
POST kibana_sample_data_ecommerce/_msearch
{}
{"query" : {"match_all" : {}},"size":1}
{"index" : "kibana_sample_data_flights"}
{"query" : {"match_all" : {}},"size":2}

6.3 common errors

Tags: ElasticSearch

Posted by gigamike187 on Thu, 19 May 2022 02:30:21 +0300