The difference between term and match in es

term usage

Let's first look at the definition of term. Term stands for exact match, that is, exact query. The search term will not be split into words before searching.

Here is an example to illustrate, first store some data:

{
    "title": "love China",
    "content": "people very love China",
    "tags": ["China", "love"]
}
{
    "title": "love HuBei",
    "content": "people very love HuBei",
    "tags": ["HuBei", "love"]
}

To use term query:

{
  "query": {
    "term": {
      "title": "love"
    }
  }
}

The result is that the above two data can be queried:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

I found that the keywords related to love in the title have been checked out, but I just want to match love China exactly, and see if I can find it according to the following writing method:

{
  "query": {
    "term": {
      "title": "love China"
    }
  }
}

No data is found during execution. Conceptually, term is an exact match, and only a single word can be checked. I want to use term to match multiple words how to do it? You can use terms to:

{
  "query": {
    "terms": {
      "title": ["love", "China"]
    }
  }
}

The query result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

Find out all the queries, why? Because the [ ] in the terms is multiple or the relationship, as long as one of the words is satisfied. If you want to notify that two words are satisfied, you have to use the must of bool, as follows:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "title": "love"
          }
        },
        {
          "term": {
            "title": "china"
          }
        }
      ]
    }
  }
}
As you can see, we used china above in lowercase. When using the capital China, when we searched, we found that we could not find any information. Why is this? When the word title is stored, word segmentation is performed. What we use here is the default word segmentation processor for word segmentation. We can see how to do word segmentation?

word processor

GET test/_analyze
{
  "text" : "love China"
}

The result is:

{
  "tokens": [
    {
      "token": "love",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "china",
      "start_offset": 5,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

The two words love and china are analyzed. And term can only match the above words completely, without any changes. Therefore, when we use a query like China, it will fail. There will be a section dedicated to tokenizers later.

match usage

First use love China to match.

GET test/doc/_search
{
  "query": {
    "match": {
      "title": "love China"
    }
  }
}

The result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": [
            "HuBei",
            "love"
          ]
        }
      }
    ]
  }
}
It was found that both were found out, why? Because when the match is searching, it will split the word first. After the split, it will be matched again. For the above two content, their title entry is: love china hubei , we searched for love China, we processed the word segmentation and got it as love china , and belongs to the relationship of or, as long as any entry is in it, it can be matched. If you want to match both love and China, what should you do? use match_phrase

match_phrase usage

match_phrase is called phrase search, and requires that all word segmentations must appear in the document at the same time, and the positions must be close to the same.

GET test/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "love china"
    }
  }
}

The result is:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      }
    ]
  }
}

This time it seems to meet our needs, but only one record appeared.

 

Original link: https://www.jianshu.com/p/d5583dff4157

Posted by digitalecartoons on Fri, 06 May 2022 06:08:38 +0300