Elasticsearch is a real-time distributed search and analysis engine. Its bottom layer is built on Lucene. In short, Lucene has distributed functions by expanding its search capabilities. ES usually provides the function of end-to-end log / search analysis together with the other two open source components logstash (log collection) and Kibana (d ...
Posted by kmaid on Thu, 14 Apr 2022 22:59:57 +0300
MySQL is widely used in the storage database of massive business. In the era of big data, we urgently need to analyze the massive data, but it is obviously unrealistic to analyze the big data on MySQL, which will affect the operation stability of the business system. If we want to analyze these data in real time, we need to copy them t ...
Posted by Voodoo Jai on Thu, 14 Apr 2022 03:20:23 +0300
Recently, the group wanted to build a data analysis system for user data, and then the group asked them to study big data technology first. Therefore, they also started google big data with a confused face. As a result, a pile came out. They felt that the knowledge system of big data was a little huge. After reading a pile, they decided to star ...
Posted by Kieran Huggins on Wed, 13 Apr 2022 21:18:31 +0300
This is the supplementary part of Experiment 3 of introduction to data science course in the direction of data science in the first semester of junior year of Shandong University, which is supplemented based on the experimental documents issued by the teacher. Relevant documents are given at the end
Social activities wil ...
Posted by iBuddy Media on Mon, 11 Apr 2022 00:52:15 +0300
Build an environment that can run the demo of Flink Hudi and spark Hudi locally. The local environment is an M1 chip with arm64 architecture, so it is special. If you use docker on Hudi's official website, it is not supported at present. I also mentioned such requirements on Hudi's github. Although it has been responded, there will b ...
Posted by lisaNewbie on Sun, 10 Apr 2022 16:17:43 +0300
Premise of data management:
Actively manage data as an asset and derive sustained value from it.
To achieve value, we need goals, planning, collaboration and guarantee, as well as management and leadership.
Data management is to: (function)
To deliver, control, protect and enhance the value of data and information assets; And formula ...
Posted by gibbo1715 on Sun, 10 Apr 2022 06:30:28 +0300
Write in front
Spark yarn installation needs to be installed first:
See Zookeeper installation tutorial Installation of Zookeeper cluster in CentOS7
For Hadoop installation tutorial, see CentOS7 installing Hadoop clusters
Deployment modes include Local mode and standalone mode, which are researched by ourselves
In the S ...
Posted by simonmlewis on Thu, 07 Apr 2022 18:09:12 +0300
Apache Flume is a distributed, reliable and available system for effectively collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.
The use of Apache Flume is not limited to log data aggregation. Because the data source is customizable, Flume can be ...
Recently, I was lucky to come into contact with the masterpiece of medcl: Limit gateway (INFINI GATEWAY). INFINI GATEWAY has many advantages and many application scenarios. You can Official website Read on. In short, INFINI Gateway is a platform for Elasticsearch High performance application gateway, which contains rich features and is very si ...
Posted by mikem562 on Wed, 06 Apr 2022 09:54:00 +0300