ElasticSearch dynamic mapping and static mapping, as well as four field types

@[toc]
We have already sent three consecutive tutorials of ElasticSearch series. Today, the fourth one, let's talk about dynamic mapping, static mapping and four different field types in Es.

This article is a note of the video tutorial recorded by SongGe. The note is concise and complete. Partners can refer to the video. Video download link: https://pan.baidu.com/s/1oKiV... Extraction code: p3sx

1.ElasticSearch mapping

Mapping is mapping, which is used to define a document and how the fields contained in the document should be stored and indexed. Therefore, it is actually a bit similar to the definition of tables in relational databases.

1.1 mapping classification

Dynamic mapping

As the name suggests, it is a map created automatically. es automatically analyzes the type and storage mode of fields in the document according to the stored document. This is dynamic mapping.

For a simple example, create a new index and view the index information:

In the created index information, you can see that the mappings is empty, and the mapping information is saved in the mappings.

Now we add a document to the index as follows:

PUT blog/_doc/1
{
  "title":"1111",
  "date":"2020-11-11"
}

After the document is added successfully, Mappings will be generated automatically:

You can see that the type of date field is date, and there are two types of title, text and keyword.

By default, if a field is added in the document, it will be automatically added in mappings.

Sometimes, if you want to add a field, you can throw an exception to remind the developer. This can be configured through the dynamic attribute in mappings.

The dynamic attribute has three values:

  • true, the default is this. Automatically add new fields.
  • false, ignore the new field.
  • Strict, strict mode. Exceptions will be thrown when new fields are found.

The specific configuration method is as follows: specify mappings when creating an index (this is actually static mapping):

PUT blog
{
  "mappings": {
    "dynamic":"strict",
    "properties": {
      "title":{
        "type": "text"
      },
      "age":{
        "type":"long"
      }
    }
  }
}

Then add data to the blog index:

PUT blog/_doc/2
{
  "title":"1111",
  "date":"2020-11-11",
  "age":99
}

In the added document, there is an additional date field, which is not predefined, so the addition operation returns an error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [date] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [date] within [_doc] is not allowed"
  },
  "status" : 400
}

Dynamic mapping also has a problem of date detection.

For example, create a new index, and then add a document with date, as follows:

PUT blog/_doc/1
{
  "remark":"2020-11-11"
}

After the addition is successful, the remark field will be inferred as a date type.

At this point, the remark field cannot store other types.

PUT blog/_doc/1
{
  "remark":"javaboy"
}

At this time, the error is reported as follows:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse field [remark] of type [date] in document with id '1'. Preview of field's value: 'javaboy'"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [remark] of type [date] in document with id '1'. Preview of field's value: 'javaboy'",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "failed to parse date field [javaboy] with format [strict_date_optional_time||epoch_millis]",
      "caused_by" : {
        "type" : "date_time_parse_exception",
        "reason" : "Failed to parse with all enclosed parsers"
      }
    }
  },
  "status" : 400
}

To solve this problem, you can use static mapping, that is, when defining the index, specify remark as text type. You can also turn off date detection.

PUT blog
{
  "mappings": {
    "date_detection": false
  }
}

At this time, the date type will be processed as a written document.

static mapping

Slightly.

1.2 type inference

The inference method of dynamic mapping type in es is as follows:

Data in JSON Automatically inferred data type
null No fields added
true/false boolean
Floating point number float
number long
JSON object object
array Determined by the first non null value in the array
string text/keyword/date/double/long is possible

2.ElasticSearch field type

2.1 core type

2.1.1 string type

  • String: This is an expired string type. Before es5, this was used to describe strings. Now, it has been replaced by text and keyword.
  • Text: if a field is to be retrieved in full text, such as blog content, news content and product description, text can be used. After the text is divided into words, it will be inverted by the content separator. Fields of type text are not used for sorting and are rarely used for aggregation. This string is also called an analyzed field.
  • keyword: this type is applicable to structured fields, such as tag, email address, mobile phone number, etc. this type of field can be used for filtering, sorting, aggregation, etc. This string is also called a not analyzed field.

2.1.2 digital type

type Value range
long -2 ^ 63 to 2 ^ 63-1
integer -2 ^ 31 to 2 ^ 31-1
short -2 ^ 15 to 2 ^ 15-1
byte -2 ^ 7 to 2 ^ 7-1
double 64 bit double precision IEEE754 floating point type
float 32-bit double precision IEEE754 floating point type
half_float 16 bit double precision IEEE754 floating point type
scaled_float Floating point type of scale type
  • When the requirements are met, give priority to fields with small scope. The shorter the field length, the higher the efficiency of indexing and searching.
  • Floating point numbers, scaled is preferred_ float.

scaled_float example:

PUT product
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "price":{
        "type": "scaled_float",
        "scaling_factor": 100
      }
    }
  }
}

2.1.3 date type

Since there is no date type in JSON, the date types in es have various forms:

  • 2020-11-11 or 2020-11-11 11:11
  • A second or millisecond from zero on January 1, 1970 to the present.

es internally converts the time to UTC, and then stores the time according to the long integer of millseconds since the epoch.

Custom date type:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      }
    }
  }
}

There are many time formats that can be parsed.

PUT product/_doc/1
{
  "date":"2020-11-11"
}

PUT product/_doc/2
{
  "date":"2020-11-11T11:11:11Z"
}


PUT product/_doc/3
{
  "date":"1604672099958"
}

The dates in the above three documents can be parsed, and the internal storage is a long integer number timed in milliseconds.

2.1.4 boolean type

"True", "false", "true" and "false" in JSON are OK.

2.1.5 binary type

Binary accepts base64 encoded strings, which are not stored or searchable by default.

2.1.6 scope and type

  • integer_range
  • float_range
  • long_range
  • double_range
  • date_range
  • ip_range

When defining, you can specify the range type:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      },
      "price":{
        "type":"float_range"
      }
    }
  }
}

When inserting a document, you need to specify the boundary of the range:

PUT product
{
  "mappings": {
    "properties": {
      "date":{
        "type": "date"
      },
      "price":{
        "type":"float_range"
      }
    }
  }
}

When specifying a range, you can use gt, gte, lt, lte.

2.2 composite type

2.2.1 array type

There is no special array type in es. By default, any field can have one or more values. It should be noted that the elements in the array must be of the same type.

Adding an array is that the first element in the array determines the type of the whole array.

2.2.2 object type

Because JSON itself has a hierarchical relationship, the document contains internal objects. Internal objects can also be included.

PUT product/_doc/2
{
  "date":"2020-11-11T11:11:11Z",
  "ext_info":{
    "address":"China"
  }
}

2.2.3 nested type

nested is a special case of object.

If the object type is used, suppose there is one of the following documents:

{
  "user":[
    {
      "first":"Zhang",
      "last":"san"
    },
    {
      "first":"Li",
      "last":"si"
    }
    ]
}

Because Lucene does not have the concept of internal objects, es will flatten the object hierarchy and turn an object into a simple list composed of field names and values. The final storage form of the above documents is as follows:

{
"user.first":["Zhang","Li"],
"user.last":["san","si"]
}

After flattening, the relationship between user names is gone. This will lead to a search for Zhang si.

In this case, the problem can be solved by using the nested object type, which can maintain the independence of each object in the array. Nested type indexes each nested object in the array as an independent hidden document, so that each nested object can be indexed independently.

{
{
"user.first":"Zhang",
"user.last":"san"
},{
"user.first":"Li",
"user.last":"si"
}
}

advantage

Documents are stored together with high reading performance.

shortcoming

When updating a parent or child document, you need to update another document.

2.3 geographical type

Usage scenario:

  • Find a geographic location within a range
  • Documents are aggregated by geographic location or distance from the center point
  • Add the distance to the score of the whole document
  • Sort documents by distance

2.3.1 geo_point

geo_point is a coordinate point, which is defined as follows:

PUT people
{
  "mappings": {
    "properties": {
      "location":{
        "type": "geo_point"
      }
    }
  }
}

Specify the field type when creating. There are four ways to store it:

PUT people/_doc/1
{
  "location":{
    "lat": 34.27,
    "lon": 108.94
  }
}

PUT people/_doc/2
{
  "location":"34.27,108.94"
}

PUT people/_doc/3
{
  "location":"uzbrgzfxuzup"
}

PUT people/_doc/4
{
  "location":[108.94,34.27]
}

Note that using an array description, longitude first and then latitude.

Address location to geo_hash: http://www.csxgame.top/#/

2.3.2 geo_shape

GeoJSON ElasticSearch remarks
Point point A point described by latitude and longitude
LineString linestring An arbitrary line consisting of more than two points
Polygon polygon A closed polygon
MultiPoint multipoint A set of discontinuous points
MultiLineString multilinestring Multiple unrelated lines
MultiPolygon multipolygon Multiple polygons
GeometryCollection geometrycollection Collection of geometric objects
circle A circle
envelope A rectangle defined by two points in the upper left and lower right corners

Specify geo_shape type:

PUT people
{
  "mappings": {
    "properties": {
      "location":{
        "type": "geo_shape"
      }
    }
  }
}

When adding a document, you need to specify the specific type:

PUT people/_doc/1
{
  "location":{
    "type":"point",
    "coordinates": [108.94,34.27]
  }
}

If it is a linestring, it is as follows:

PUT people/_doc/2
{
  "location":{
    "type":"linestring",
    "coordinates": [[108.94,34.27],[100,33]]
  }
}

2.4 special types

2.4.1 IP

Store ip address, type ip:

PUT blog
{
  "mappings": {
    "properties": {
      "address":{
        "type": "ip"
      }
    }
  }
}

Add document:

PUT blog/_doc/1
{
  "address":"192.168.91.1"
}

Search documents:

GET blog/_search
{
  "query": {
    "term": {
      "address": "192.168.0.0/16"
    }
  }
}

2.4.2 token_count

Used to count the number of word items after string word segmentation.

PUT blog
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "fields": {
          "length":{
            "type":"token_count",
            "analyzer":"standard"
          }
        }
      }
    }
  }
}

It is equivalent to adding a new title The length field is used to count the number of word items after word segmentation.

Add document:

PUT blog/_doc/1
{
  "title":"zhang san"
}

You can use token_count to query:

GET blog/_search
{
  "query": {
    "term": {
      "title.length": 2
    }
  }
}

Finally, brother song also collected 50 + project requirements documents. If you want to be a little partner for project training, you might as well have a look



Address of requirement document: https://github.com/lenve/javadoc

Tags: Java ElasticSearch ELK elastic

Posted by mhodge87 on Thu, 05 May 2022 11:06:50 +0300