Using ElasticSearch under .NET Core

Quick start

Getting Started with Elasticsearch

ElasticSearch It is an open source search engine based on Apache Lucene™, a full-text search engine library. Lucene can be said to be the most advanced, high-performance, full-featured search engine library, whether open source or private.

But Lucene is just a library. To take full advantage of its capabilities, you need to use Java and integrate Lucene directly into your application. To make matters worse, you may need a degree in Information Retrieval to understand how it works. Lucene is very complicated.

ElasticSearch Also written in Java, it uses Lucene for indexing and searching internally, but its purpose is to make full-text retrieval easy, by hiding the complexity of Lucene and instead providing a simple and consistent RESTful API.

However, Elasticsearch is more than just Lucene, and it's not just a full-text search engine. It can be accurately described as follows:

  • A distributed real-time document store where each field can be indexed and searched
  • A distributed real-time analytical search engine
  • Capable of scaling hundreds of service nodes and supporting PB-level structured or unstructured data

Official clients are available in Java, .NET, PHP, Python, Ruby, Nodejs and many other languages. According to DB-Engines rankings, ElasticSearch is the most popular enterprise search engine, followed by Apache Solr, also based on Lucene.

ES Development Guide

For Chinese documents, please refer to: Elasticsearch: The Definitive Guide

For English documents, please refer to: <Elasticsearch Reference>

download: https://www.elastic.co/cn/downloads/

ES API Documentation

API Conventions

Document APIs

Search APIs

Indices APIs

cat APIs

Cluster APIs

Javascript api

Logstash

Logstash Reference

Configuring Logstash

Input plugins

Output plugins

Filter plugins

Kibana DevTools Shortcuts

  • Ctrl+i auto-indent
  • Ctrl+Enter Submit
  • Down opens the autocomplete menu
  • Enter or Tab selection auto-completion
  • Esc closes the completion menu

pretty = true Adding the pretty parameter to any query string will cause Elasticsearch to prettify (pretty-print)JSON responses for easier reading.

Kibana commands

// Querying the Disk Status of a Cluster
GET _cat/allocation?v

// get all indexes
GET _cat/indices

// Sort by index number
GET _cat/indices?s=docs.count:desc
GET _cat/indices?v&s=index

// how many nodes the cluster has
GET _cat/nodes

// the state of the cluster
GET _cluster/health?pretty=true
GET _cat/indices/*?v&s=index

//Get the shard information of the specified index
GET logs/_search_shards

...

cluster status

curl -s -XGET 'http://<host>:9200/_cluster/health?pretty'

//The system is normal, the result returned
{
  "cluster_name" : "es-qwerty",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 1,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Retrieve documents

POST logs/_search
{
  "query":{
    "range":{
      "createdAt":{
        "gt":"2020-04-25",
        "lt":"2020-04-27",
        "format": "yyyy-MM-dd"
      }
    }
  },
  "size":0,
  "aggs":{
    "url_type_stats":{
      "terms": {
        "field": "urlType.keyword",
        "size": 2
      }
    }
  }
}

POST logs/_search
{
  "query":{
    "range":{
      "createdAt":{
        "gte":"2020-04-26 00:00:00",
        "lte":"now",
        "format": "yyyy-MM-dd hh:mm:ss"
      }
    }
  },
  "size":0,
  "aggs":{
    "url_type_stats":{
      "terms": {
        "field": "urlType.keyword",
        "size": 2
      }
    }
  }
}

POST logs/_search
{
  "query":{
    "range": {
      "createdAt": {
        "gte": "2020-04-26 00:00:00",
        "lte": "now",
         "format": "yyyy-MM-dd hh:mm:ss"
      }
    }
  },
  "size" : 0,
  "aggs":{
    "total_clientIp":{
      "cardinality":{
        "field": "clientIp.keyword"
      }
    },
    "total_userAgent":{
      "cardinality": {
        "field": "userAgent.keyword"
      }
    }
  }
}

POST logs/_search
{
  "size" : 0,
  "aggs":{
    "date_total_ClientIp":{
      "date_histogram":{
        "field": "createdAt",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "extended_bounds":{
          "min": "2020-04-26 13:00:00",
          "max": "2020-04-26 14:00:00",
        }
      },
      "aggs":{
        "url_type_api": {
          "terms": {
            "field": "urlType.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

POST logs/_search
{
  "size" : 0,
  "aggs":{
    "total_clientIp":{
      "terms":{
        "size":30,
        "field": "clientIp.keyword"
      }
    }
  }
}

delete document

// delete
POST logs/_delete_by_query {"query":{"match_all": {}}}

// drop index
DELETE logs

create index

The essence of data migration is the rebuilding of the index. The rebuilding index does not try to set the target index, it does not copy the settings of the source index. So set the target index before the operation, including setting the mapping, number of shards, replicas, etc.

data migration

Reindex from Remoteedit

// Reindex supports rebuilding indexes from remote Elasticsearch clusters:
POST _reindex
{
  "source": {
    "remote": {
      "host": "http://lotherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

// The host parameter must contain scheme, host and port (eg https://lotherhost:9200)
// The username and password parameters are optional

When using it, you need to configure the reindex.remote.whitelist property in elasticsearch.yml. Multiple groups can be set (for example, lotherhost:9200, another:9200, 127.0.10.*:9200, localhost:*).

For specific use, please refer to Reindex from Remoteedit

Elasticsearch-Dump

Elasticsearch-Dump is an open source toolkit for importing and exporting elasticsearch data. Installation and migration related executions can be performed on cloud hosts in the same availability zone, which is easy to use.

Need node environment, npm install elasticdump

npm install elasticdump -g
elasticdump

// Copy an index from production to staging with analyzer and mapping:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=analyzer
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

// Copy a single shard data:
elasticdump \
  --input=http://es.com:9200/api \
  --output=http://es.com:9200/api2 \
  --params='{"preference" : "_shards:0"}'

Use reference for other parameters of elasticdump command Elasticdump Options

deep paging

  • The pagination query of elasticsearch with more than 10,000 pieces of data will report an exception. The official provides the search_after method to support
  • search_after requires two required sorting identifiers on the previous page
//https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html
GET logs/_search
{
  "from":9990,
  "size":10,
  "_source": ["url","clientIp","createdAt"],
  "query":{
    "match_all": {}
  },
  "sort":[
    {
      "createdAt":{
        "order":"desc"
      }
    },
    {
      "_id":{
        "order":"desc"
      }
    }
    ]
}

GET logs/_search
{
  "from":-1,
  "size":10,
  "_source": ["url","clientIp","createdAt"],
  "query":{
    "match_all": {}
  },
  "search_after": [1588042597000, "V363vnEBz1D1HVfYBb0V"],
  "sort":[
    {
      "createdAt":{
        "order":"desc"
      }
    },
    {
      "_id":{
        "order":"desc"
      }
    }
    ]
}

Install

  • Install Elasticsearch under docker
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.8.1
docker run -p 9200:9200 --name elasticsearch -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.8.1
docker pull docker.elastic.co/kibana/kibana:7.8.1
docker run -p 5601:5601 --name kibana --link 14e385b1e761:elasticsearch -e "elasticsearch.hosts=http://127.0.0.1:9200" -d docker.elastic.co/kibana/kibana:7.8.1

Access to use

Create a new webapi project and install two components.

Install-Package NEST
Install-Package Swashbuckle.AspNetCore

To operate Elasticsearch through NEST, open source address: https://github.com/elastic/elasticsearch-net , and also add the following to swagger to facilitate calling the interface later.

Next, a CRUD operation on Elasticsearch is demonstrated.

Add entity class: VisitLog.cs.

using System;

namespace ESDemo.Domain
{
    public class VisitLog
    {
        public string Id { get; set; }

        /// <summary>
        /// UserAgent
        /// </summary>
        public string UserAgent { get; set; }

        /// <summary>
        /// Method
        /// </summary>
        public string Method { get; set; }

        /// <summary>
        /// Url
        /// </summary>
        public string Url { get; set; }

        /// <summary>
        /// Referrer
        /// </summary>
        public string Referrer { get; set; }

        /// <summary>
        /// IpAddress
        /// </summary>
        public string IpAddress { get; set; }

        /// <summary>
        /// Milliseconds
        /// </summary>
        public int Milliseconds { get; set; }

        /// <summary>
        /// QueryString
        /// </summary>
        public string QueryString { get; set; }

        /// <summary>
        /// Request Body
        /// </summary>
        public string RequestBody { get; set; }

        /// <summary>
        /// Cookies
        /// </summary>
        public string Cookies { get; set; }

        /// <summary>
        /// Headers
        /// </summary>
        public string Headers { get; set; }

        /// <summary>
        /// StatusCode
        /// </summary>
        public int StatusCode { get; set; }

        /// <summary>
        /// Response Body
        /// </summary>
        public string ResponseBody { get; set; }

        public DateTimeOffset CreatedAt { get; set; } = DateTimeOffset.UtcNow;
    }
}

After determining the entity class, let's wrap Elasticsearch and simply encapsulate a base class for integrated use of warehousing.

Add an interface class IElasticsearchProvider.

using Nest;

namespace ESDemo.Elasticsearch
{
    public interface IElasticsearchProvider
    {
        IElasticClient GetClient();
    }
}

Implement the IElasticsearchProvider interface in ElasticsearchProvider.

using Nest;
using System;

namespace ESDemo.Elasticsearch
{
    public class ElasticsearchProvider : IElasticsearchProvider
    {
        public IElasticClient GetClient()
        {
            var connectionSettings = new ConnectionSettings(new Uri("http://localhost:9200"));

            return new ElasticClient(connectionSettings);
        }
    }
}

Add the Elasticsearch repository base class, ElasticsearchRepositoryBase.

using Nest;

namespace ESDemo.Elasticsearch
{
    public abstract class ElasticsearchRepositoryBase
    {
        private readonly IElasticsearchProvider _elasticsearchProvider;

        public ElasticsearchRepositoryBase(IElasticsearchProvider elasticsearchProvider)
        {
            _elasticsearchProvider = elasticsearchProvider;
        }

        protected IElasticClient Client => _elasticsearchProvider.GetClient();

        protected abstract string IndexName { get; }
    }
}

That is, an abstract class. When we integrate this base class, we need to rewrite the protected abstract string IndexName { get; } and specify the IndexName.

After completing the above simple encapsulation, now create a new IVisitLogRepository repository interface and add four methods:

using ESDemo.Domain;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace ESDemo.Repositories
{
    public interface IVisitLogRepository
    {
        Task InsertAsync(VisitLog visitLog);

        Task DeleteAsync(string id);

        Task UpdateAsync(VisitLog visitLog);

        Task<Tuple<int, IList<VisitLog>>> QueryAsync(int page, int limit);
    }
}

So it goes without saying that you know what to change, implement this repository interface, add VisitLogRepository, and the code is as follows:

using ESDemo.Domain;
using ESDemo.Elasticsearch;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace ESDemo.Repositories
{
    public class VisitLogRepository : ElasticsearchRepositoryBase, IVisitLogRepository
    {
        public VisitLogRepository(IElasticsearchProvider elasticsearchProvider) : base(elasticsearchProvider)
        {
        }

        protected override string IndexName => "visitlogs";

        public async Task InsertAsync(VisitLog visitLog)
        {
            await Client.IndexAsync(visitLog, x => x.Index(IndexName));
        }

        public async Task DeleteAsync(string id)
        {
            await Client.DeleteAsync<VisitLog>(id, x => x.Index(IndexName));
        }

        public async Task UpdateAsync(VisitLog visitLog)
        {
            await Client.UpdateAsync<VisitLog>(visitLog.Id, x => x.Index(IndexName).Doc(visitLog));
        }

        public async Task<Tuple<int, IList<VisitLog>>> QueryAsync(int page, int limit)
        {
            var query = await Client.SearchAsync<VisitLog>(x => x.Index(IndexName)
                                    .From((page - 1) * limit)
                                    .Size(limit)
                                    .Sort(x => x.Descending(v => v.CreatedAt)));
            return new Tuple<int, IList<VisitLog>>(Convert.ToInt32(query.Total), query.Documents.ToList());
        }
    }
}

Now go to write the interface, add a VisitLogControllerAPI controller, the code is as follows:

using ESDemo.Domain;
using ESDemo.Repositories;
using Microsoft.AspNetCore.Mvc;
using System.ComponentModel.DataAnnotations;
using System.Threading.Tasks;

namespace ESDemo.Controllers
{
    [Route("api/[controller]")]
    [ApiController]
    public class VisitLogController : ControllerBase
    {
        private readonly IVisitLogRepository _visitLogRepository;

        public VisitLogController(IVisitLogRepository visitLogRepository)
        {
            _visitLogRepository = visitLogRepository;
        }

        [HttpGet]
        public async Task<IActionResult> QueryAsync(int page = 1, int limit = 10)
        {
            var result = await _visitLogRepository.QueryAsync(page, limit);

            return Ok(new
            {
                total = result.Item1,
                items = result.Item2
            });
        }

        [HttpPost]
        public async Task<IActionResult> InsertAsync([FromBody] VisitLog visitLog)
        {
            await _visitLogRepository.InsertAsync(visitLog);

            return Ok("added successfully");
        }

        [HttpDelete]
        public async Task<IActionResult> DeleteAsync([Required] string id)
        {
            await _visitLogRepository.DeleteAsync(id);

            return Ok("successfully deleted");
        }

        [HttpPut]
        public async Task<IActionResult> UpdateAsync([FromBody] VisitLog visitLog)
        {
            await _visitLogRepository.UpdateAsync(visitLog);

            return Ok("Successfully modified");
        }
    }
}

You're done, don't forget to add the service in Startup.cs in the last step, otherwise you can't use dependency injection.

...
services.AddSingleton<IElasticsearchProvider, ElasticsearchProvider>();
services.AddSingleton<IVisitLogRepository, VisitLogRepository>();
...

Everything is ready, now run the project with anticipation and open the swagger interface.

Call the interface in the order of adding, updating, deleting, and querying. You can add a few more times, because there is no data by default. Adding a little more can test whether the paging is ok, and I will not demonstrate it here.

If you have kibana installed, you can now look at the data you just added to your surprise.

GET _cat/indices

GET visitlogs/_search
{}

It can be seen that the data has been quietly lying here.

This article briefly introduces the use of Elasticsearch in .NET Core. There are still many grammars about retrieving data that are not reflected. If you need to use it in development, you can refer to the official various data query examples: https://github.com/elastic/elasticsearch-net/tree/master/examples

Tags: .NET

Posted by rationalrabbit on Mon, 16 May 2022 18:21:18 +0300