. net core how to batch insert data into elasticsearch -- 2

In the last article, we talked about how to create indexes and insert data into elastic search( https://www.cnblogs.com/zpy1993-09/p/13380197.html)

Today, we are making an extension. Last time, we talked about inserting data in a single line. If the amount of data is small, it is OK. If it is large, the single line insertion can not meet the demand. We must insert in batch. If you do batch insertion, you usually do a cache. After a certain time or a certain number of entries, you can insert them in batch. If you do it for a certain time, it is a timed cache. For example, if the amount of data is large, you can insert the cached data in batch in 10 minutes. If you insert 1000 caches at a time. The specific situation should be based on the timeliness of data needs.

Today, let's talk about how to batch insert according to a certain number of entries. So what should cache do? There are many ways. You can see a message queue, or stack, dictionary or array.

In order to facilitate understanding, I will use the most common List array as a cache today.

Since we send data in batches to elasticsearch through the API interface, to cache a List array, we must first consider that the instance of the List cannot be overwritten, that is, every time the interface is called to insert data, the instance of the List array must be the same. Otherwise, every time you call an interface, you need to create a new List object. How else do you cache it. Therefore, we must first ensure that the List instance is always unique. So we all think of simple interest, which is the single case model.

Since interfaces are generally received through json, we'd better do a model mapping if we want to parse json.

Suppose that the data inserted into elasticsearch through the interface is student information. First, we build a simple student class.

public class Student
{
  public string name{get;set;}//full name
  public  int     number{get;set;}//Student number
  public  int     age{get;set;}//Age
}

Then we need to define a single instance of Student's List array:

  public class ListExample<Student>
    {
        private volatile static List<Student> instance = null;
        private static readonly object _lock = new object();
        public static List<Student> GetInstance()
        {
            if (instance == null)
            {
                lock (_lock)
                {
                    if (instance == null)
                    {
                        instance = new List<Student>();
                    }
                }
            }
            return instance;
        }
    }

Can this ensure the uniqueness of the List object instance? But some people will ask, it's too troublesome. What if there are more interfaces? I can't make a single instance of each interface. Isn't that too hard.

That's true. Therefore, we need to optimize this singleton method into a general mode to support instances of any object. Therefore, we naturally think of generics.

  public class ListExample<T>
    {
        private volatile static List<T> instance = null;
        private static readonly object _lock = new object();
        public static List<T> GetInstance()
        {
            if (instance == null)
            {
                lock (_lock)
                {
                    if (instance == null)
                    {
                        instance = new List<T>();
                    }
                }
            }
            return instance;
        }
    }

 

This is not universal!

 public static class ESHelper
    {
        public static  readonly string url = "http://IP/";
        /// <summary>
        /// Batch insert
        /// </summary>
        /// <param name="obj">Incoming and outgoing data</param>
        /// <param name="index">Indexes</param>
        public static void ManyInsert(List<Student> obj, string index)
        {
            //Set the connection string, DefaultIndex The table name in should be lowercase
            var settings = new ConnectionSettings(new Uri(url)).DefaultIndex(index);
            var client = new ElasticClient(settings);
            var ndexResponse = client.IndexMany<Student>(obj);  
        }
       

    }

The problem is the same as that of a singleton. We can't be generic, so we need to be generic or generic:

 public static class ESHelper<T> where T : class
    {
        public static  readonly string url = "http://IP/";//IP address of elasticsearch
        /// <summary>
        /// Batch insert
        /// </summary>
        /// <param name="obj">Incoming and outgoing data</param>
        /// <param name="index">Indexes</param>
        public static void ManyInsert(List<T> obj, string index)
        {
            //Set the connection string, DefaultIndex The table name in should be lowercase
            var settings = new ConnectionSettings(new Uri(url)).DefaultIndex(index);
            var client = new ElasticClient(settings);
            var ndexResponse = client.IndexMany<T>(obj);  
        }

    }

 

Then we can demonstrate by sending student information to ES:

/// <summary>
       /// towards ES Send data in
       /// </summary>
       /// <param name="Data">json object</param>
       /// <returns></returns>
        [HttpPost("GetLogList")]
        public string GetLogList([FromBody] object Data)
        {
           //json array format: "{name": "Zhang San", "number",123456,"age":20} "
            JObject LogList = (JObject)JsonConvert.DeserializeObject(Data.ToString());
Student student=new Student{name=LogList["name"],number=LogList["number"],age=LogList["age"]}
ListExample<Student>.GetInstance().Add(student);
if(
ListExample<Student>.GetInstance().Count==1000)
{

                ESHelper<Student>.ManyInsert(ListExample<Student>.GetInstance(), "Student-" + DateTime.Now.ToString("yyyy-MM"));//Batch insert into ES
                ListExample<Sys_Action>.GetInstance().Clear();//wipe cache

            }

}

 

Is it easy to batch data to es. However, there are some disadvantages of batch insertion through a certain number of pieces. For example, 2000 pieces are inserted in batch once. If the other side calls the interface and uploads 1000 pieces of data and suddenly dies, the cached 1000 pieces of data will not be sent to the ES. Therefore, it is best to send data to the ES in batch at regular intervals. In the next chapter, we will talk about how to insert data into the ES in batch at regular intervals.

   

    

 

 

 

 

  

Tags: .NET ELK

Posted by LukeO on Sun, 22 May 2022 13:36:13 +0300