Quick start
Getting Started with Elasticsearch
ElasticSearch It is an open source search engine based on Apache Lucene™, a full-text search engine library. Lucene can be said to be the most advanced, high-performance, full-featured search engine library, whether open source or private.
But Lucene is just a library. To take full advantage of its capabilities, you need to use Java and integrate Lucene directly into your application. To make matters worse, you may need a degree in Information Retrieval to understand how it works. Lucene is very complicated.
ElasticSearch Also written in Java, it uses Lucene for indexing and searching internally, but its purpose is to make full-text retrieval easy, by hiding the complexity of Lucene and instead providing a simple and consistent RESTful API.
However, Elasticsearch is more than just Lucene, and it's not just a full-text search engine. It can be accurately described as follows:
- A distributed real-time document store where each field can be indexed and searched
- A distributed real-time analytical search engine
- Capable of scaling hundreds of service nodes and supporting PB-level structured or unstructured data
Official clients are available in Java, .NET, PHP, Python, Ruby, Nodejs and many other languages. According to DB-Engines rankings, ElasticSearch is the most popular enterprise search engine, followed by Apache Solr, also based on Lucene.
ES Development Guide
For Chinese documents, please refer to: Elasticsearch: The Definitive Guide
For English documents, please refer to: <Elasticsearch Reference>
download: https://www.elastic.co/cn/downloads/
ES API Documentation
Logstash
Kibana DevTools Shortcuts
- Ctrl+i auto-indent
- Ctrl+Enter Submit
- Down opens the autocomplete menu
- Enter or Tab selection auto-completion
- Esc closes the completion menu
pretty = true Adding the pretty parameter to any query string will cause Elasticsearch to prettify (pretty-print)JSON responses for easier reading.
Kibana commands
// Querying the Disk Status of a Cluster GET _cat/allocation?v // get all indexes GET _cat/indices // Sort by index number GET _cat/indices?s=docs.count:desc GET _cat/indices?v&s=index // how many nodes the cluster has GET _cat/nodes // the state of the cluster GET _cluster/health?pretty=true GET _cat/indices/*?v&s=index //Get the shard information of the specified index GET logs/_search_shards ...
cluster status
curl -s -XGET 'http://<host>:9200/_cluster/health?pretty' //The system is normal, the result returned { "cluster_name" : "es-qwerty", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 1, "active_shards" : 2, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
Retrieve documents
POST logs/_search { "query":{ "range":{ "createdAt":{ "gt":"2020-04-25", "lt":"2020-04-27", "format": "yyyy-MM-dd" } } }, "size":0, "aggs":{ "url_type_stats":{ "terms": { "field": "urlType.keyword", "size": 2 } } } } POST logs/_search { "query":{ "range":{ "createdAt":{ "gte":"2020-04-26 00:00:00", "lte":"now", "format": "yyyy-MM-dd hh:mm:ss" } } }, "size":0, "aggs":{ "url_type_stats":{ "terms": { "field": "urlType.keyword", "size": 2 } } } } POST logs/_search { "query":{ "range": { "createdAt": { "gte": "2020-04-26 00:00:00", "lte": "now", "format": "yyyy-MM-dd hh:mm:ss" } } }, "size" : 0, "aggs":{ "total_clientIp":{ "cardinality":{ "field": "clientIp.keyword" } }, "total_userAgent":{ "cardinality": { "field": "userAgent.keyword" } } } } POST logs/_search { "size" : 0, "aggs":{ "date_total_ClientIp":{ "date_histogram":{ "field": "createdAt", "interval": "quarter", "format": "yyyy-MM-dd", "extended_bounds":{ "min": "2020-04-26 13:00:00", "max": "2020-04-26 14:00:00", } }, "aggs":{ "url_type_api": { "terms": { "field": "urlType.keyword", "size": 10 } } } } } } POST logs/_search { "size" : 0, "aggs":{ "total_clientIp":{ "terms":{ "size":30, "field": "clientIp.keyword" } } } }
delete document
// delete POST logs/_delete_by_query {"query":{"match_all": {}}} // drop index DELETE logs
create index
The essence of data migration is the rebuilding of the index. The rebuilding index does not try to set the target index, it does not copy the settings of the source index. So set the target index before the operation, including setting the mapping, number of shards, replicas, etc.
data migration
Reindex from Remoteedit
// Reindex supports rebuilding indexes from remote Elasticsearch clusters: POST _reindex { "source": { "remote": { "host": "http://lotherhost:9200", "username": "user", "password": "pass" }, "index": "source", "query": { "match": { "test": "data" } } }, "dest": { "index": "dest" } } // The host parameter must contain scheme, host and port (eg https://lotherhost:9200) // The username and password parameters are optional
When using it, you need to configure the reindex.remote.whitelist property in elasticsearch.yml. Multiple groups can be set (for example, lotherhost:9200, another:9200, 127.0.10.*:9200, localhost:*).
For specific use, please refer to Reindex from Remoteedit
Elasticsearch-Dump
Elasticsearch-Dump is an open source toolkit for importing and exporting elasticsearch data. Installation and migration related executions can be performed on cloud hosts in the same availability zone, which is easy to use.
Need node environment, npm install elasticdump
npm install elasticdump -g elasticdump // Copy an index from production to staging with analyzer and mapping: elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ --type=analyzer elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ --type=mapping elasticdump \ --input=http://production.es.com:9200/my_index \ --output=http://staging.es.com:9200/my_index \ --type=data // Copy a single shard data: elasticdump \ --input=http://es.com:9200/api \ --output=http://es.com:9200/api2 \ --params='{"preference" : "_shards:0"}'
Use reference for other parameters of elasticdump command Elasticdump Options
deep paging
- The pagination query of elasticsearch with more than 10,000 pieces of data will report an exception. The official provides the search_after method to support
- search_after requires two required sorting identifiers on the previous page
//https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html GET logs/_search { "from":9990, "size":10, "_source": ["url","clientIp","createdAt"], "query":{ "match_all": {} }, "sort":[ { "createdAt":{ "order":"desc" } }, { "_id":{ "order":"desc" } } ] } GET logs/_search { "from":-1, "size":10, "_source": ["url","clientIp","createdAt"], "query":{ "match_all": {} }, "search_after": [1588042597000, "V363vnEBz1D1HVfYBb0V"], "sort":[ { "createdAt":{ "order":"desc" } }, { "_id":{ "order":"desc" } } ] }
Install
- Install Elasticsearch under docker
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.8.1 docker run -p 9200:9200 --name elasticsearch -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.8.1
docker pull docker.elastic.co/kibana/kibana:7.8.1 docker run -p 5601:5601 --name kibana --link 14e385b1e761:elasticsearch -e "elasticsearch.hosts=http://127.0.0.1:9200" -d docker.elastic.co/kibana/kibana:7.8.1
Access to use
Create a new webapi project and install two components.
Install-Package NEST Install-Package Swashbuckle.AspNetCore
To operate Elasticsearch through NEST, open source address: https://github.com/elastic/elasticsearch-net , and also add the following to swagger to facilitate calling the interface later.
Next, a CRUD operation on Elasticsearch is demonstrated.
Add entity class: VisitLog.cs.
using System; namespace ESDemo.Domain { public class VisitLog { public string Id { get; set; } /// <summary> /// UserAgent /// </summary> public string UserAgent { get; set; } /// <summary> /// Method /// </summary> public string Method { get; set; } /// <summary> /// Url /// </summary> public string Url { get; set; } /// <summary> /// Referrer /// </summary> public string Referrer { get; set; } /// <summary> /// IpAddress /// </summary> public string IpAddress { get; set; } /// <summary> /// Milliseconds /// </summary> public int Milliseconds { get; set; } /// <summary> /// QueryString /// </summary> public string QueryString { get; set; } /// <summary> /// Request Body /// </summary> public string RequestBody { get; set; } /// <summary> /// Cookies /// </summary> public string Cookies { get; set; } /// <summary> /// Headers /// </summary> public string Headers { get; set; } /// <summary> /// StatusCode /// </summary> public int StatusCode { get; set; } /// <summary> /// Response Body /// </summary> public string ResponseBody { get; set; } public DateTimeOffset CreatedAt { get; set; } = DateTimeOffset.UtcNow; } }
After determining the entity class, let's wrap Elasticsearch and simply encapsulate a base class for integrated use of warehousing.
Add an interface class IElasticsearchProvider.
using Nest; namespace ESDemo.Elasticsearch { public interface IElasticsearchProvider { IElasticClient GetClient(); } }
Implement the IElasticsearchProvider interface in ElasticsearchProvider.
using Nest; using System; namespace ESDemo.Elasticsearch { public class ElasticsearchProvider : IElasticsearchProvider { public IElasticClient GetClient() { var connectionSettings = new ConnectionSettings(new Uri("http://localhost:9200")); return new ElasticClient(connectionSettings); } } }
Add the Elasticsearch repository base class, ElasticsearchRepositoryBase.
using Nest; namespace ESDemo.Elasticsearch { public abstract class ElasticsearchRepositoryBase { private readonly IElasticsearchProvider _elasticsearchProvider; public ElasticsearchRepositoryBase(IElasticsearchProvider elasticsearchProvider) { _elasticsearchProvider = elasticsearchProvider; } protected IElasticClient Client => _elasticsearchProvider.GetClient(); protected abstract string IndexName { get; } } }
That is, an abstract class. When we integrate this base class, we need to rewrite the protected abstract string IndexName { get; } and specify the IndexName.
After completing the above simple encapsulation, now create a new IVisitLogRepository repository interface and add four methods:
using ESDemo.Domain; using System; using System.Collections.Generic; using System.Threading.Tasks; namespace ESDemo.Repositories { public interface IVisitLogRepository { Task InsertAsync(VisitLog visitLog); Task DeleteAsync(string id); Task UpdateAsync(VisitLog visitLog); Task<Tuple<int, IList<VisitLog>>> QueryAsync(int page, int limit); } }
So it goes without saying that you know what to change, implement this repository interface, add VisitLogRepository, and the code is as follows:
using ESDemo.Domain; using ESDemo.Elasticsearch; using System; using System.Collections.Generic; using System.Linq; using System.Threading.Tasks; namespace ESDemo.Repositories { public class VisitLogRepository : ElasticsearchRepositoryBase, IVisitLogRepository { public VisitLogRepository(IElasticsearchProvider elasticsearchProvider) : base(elasticsearchProvider) { } protected override string IndexName => "visitlogs"; public async Task InsertAsync(VisitLog visitLog) { await Client.IndexAsync(visitLog, x => x.Index(IndexName)); } public async Task DeleteAsync(string id) { await Client.DeleteAsync<VisitLog>(id, x => x.Index(IndexName)); } public async Task UpdateAsync(VisitLog visitLog) { await Client.UpdateAsync<VisitLog>(visitLog.Id, x => x.Index(IndexName).Doc(visitLog)); } public async Task<Tuple<int, IList<VisitLog>>> QueryAsync(int page, int limit) { var query = await Client.SearchAsync<VisitLog>(x => x.Index(IndexName) .From((page - 1) * limit) .Size(limit) .Sort(x => x.Descending(v => v.CreatedAt))); return new Tuple<int, IList<VisitLog>>(Convert.ToInt32(query.Total), query.Documents.ToList()); } } }
Now go to write the interface, add a VisitLogControllerAPI controller, the code is as follows:
using ESDemo.Domain; using ESDemo.Repositories; using Microsoft.AspNetCore.Mvc; using System.ComponentModel.DataAnnotations; using System.Threading.Tasks; namespace ESDemo.Controllers { [Route("api/[controller]")] [ApiController] public class VisitLogController : ControllerBase { private readonly IVisitLogRepository _visitLogRepository; public VisitLogController(IVisitLogRepository visitLogRepository) { _visitLogRepository = visitLogRepository; } [HttpGet] public async Task<IActionResult> QueryAsync(int page = 1, int limit = 10) { var result = await _visitLogRepository.QueryAsync(page, limit); return Ok(new { total = result.Item1, items = result.Item2 }); } [HttpPost] public async Task<IActionResult> InsertAsync([FromBody] VisitLog visitLog) { await _visitLogRepository.InsertAsync(visitLog); return Ok("added successfully"); } [HttpDelete] public async Task<IActionResult> DeleteAsync([Required] string id) { await _visitLogRepository.DeleteAsync(id); return Ok("successfully deleted"); } [HttpPut] public async Task<IActionResult> UpdateAsync([FromBody] VisitLog visitLog) { await _visitLogRepository.UpdateAsync(visitLog); return Ok("Successfully modified"); } } }
You're done, don't forget to add the service in Startup.cs in the last step, otherwise you can't use dependency injection.
... services.AddSingleton<IElasticsearchProvider, ElasticsearchProvider>(); services.AddSingleton<IVisitLogRepository, VisitLogRepository>(); ...
Everything is ready, now run the project with anticipation and open the swagger interface.
Call the interface in the order of adding, updating, deleting, and querying. You can add a few more times, because there is no data by default. Adding a little more can test whether the paging is ok, and I will not demonstrate it here.
If you have kibana installed, you can now look at the data you just added to your surprise.
GET _cat/indices GET visitlogs/_search {}
It can be seen that the data has been quietly lying here.
This article briefly introduces the use of Elasticsearch in .NET Core. There are still many grammars about retrieving data that are not reflected. If you need to use it in development, you can refer to the official various data query examples: https://github.com/elastic/elasticsearch-net/tree/master/examples