HBase installation and configuration and common Shell commands

HBase installation and configuration Step 1: unzip the HBase installation package [root@master ~]# tar -zxvf /opt/software/hbase-1.2.1-bin.tar.gz -C /usr/local/src/ Step 2: Rename HBase installation folder [root@master ~]# cd /usr/local/src/ [root@master src]# mv hbase-1.2.1 hbase Step 3: add environment variables to all nodes [root@mas ...

Posted by mrbuter on Fri, 15 Apr 2022 07:12:58 +0300

Comparison of Elasticsearch and Clickhouse basic queries

Elasticsearch is a real-time distributed search and analysis engine. Its bottom layer is built on Lucene. In short, Lucene has distributed functions by expanding its search capabilities. ES usually provides the function of end-to-end log / search analysis together with the other two open source components logstash (log collection) and Kibana (d ...

Posted by kmaid on Thu, 14 Apr 2022 22:59:57 +0300

Flume+Kafka get MySQL data

abstract MySQL is widely used in the storage database of massive business. In the era of big data, we urgently need to analyze the massive data, but it is obviously unrealistic to analyze the big data on MySQL, which will affect the operation stability of the business system. If we want to analyze these data in real time, we need to copy them t ...

Posted by Voodoo Jai on Thu, 14 Apr 2022 03:20:23 +0300

Build flynk stand-alone and quickly write a simple java job demo to run

Recently, the group wanted to build a data analysis system for user data, and then the group asked them to study big data technology first. Therefore, they also started google big data with a confused face. As a result, a pile came out. They felt that the knowledge system of big data was a little huge. After reading a pile, they decided to star ...

Posted by Kieran Huggins on Wed, 13 Apr 2022 21:18:31 +0300

Community division and Sangji map drawing

This is the supplementary part of Experiment 3 of introduction to data science course in the direction of data science in the first semester of junior year of Shandong University, which is supplemented based on the experimental documents issued by the teacher. Relevant documents are given at the end Problem background Social activities wil ...

Posted by iBuddy Media on Mon, 11 Apr 2022 00:52:15 +0300

Build a data Lake Hudi environment from 0 to 1

1, Target Build an environment that can run the demo of Flink Hudi and spark Hudi locally. The local environment is an M1 chip with arm64 architecture, so it is special. If you use docker on Hudi's official website, it is not supported at present. I also mentioned such requirements on Hudi's github. Although it has been responded, there will b ...

Posted by lisaNewbie on Sun, 10 Apr 2022 16:17:43 +0300

[guide to DAMA data management knowledge system] Chapter 1: Data Management

Premise of data management: Actively manage data as an asset and derive sustained value from it. To achieve value, we need goals, planning, collaboration and guarantee, as well as management and leadership. Data management is to: (function) To deliver, control, protect and enhance the value of data and information assets; And formula ...

Posted by gibbo1715 on Sun, 10 Apr 2022 06:30:28 +0300

Installing Spark cluster in CentOS7 (yarn mode)

Write in front Spark yarn installation needs to be installed first: See Zookeeper installation tutorial Installation of Zookeeper cluster in CentOS7 For Hadoop installation tutorial, see CentOS7 installing Hadoop clusters Deployment modes include Local mode and standalone mode, which are researched by ourselves Deployment description In the S ...

Posted by simonmlewis on Thu, 07 Apr 2022 18:09:12 +0300

[big data practice] flume data collection

flume Quick start summary Apache Flume is a distributed, reliable and available system for effectively collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The use of Apache Flume is not limited to log data aggregation. Because the data source is customizable, Flume can be ...

Posted by BigX on Thu, 07 Apr 2022 10:54:03 +0300

INFINI Gateway: Getting Started Guide

Recently, I was lucky to come into contact with the masterpiece of medcl: Limit gateway (INFINI GATEWAY). INFINI GATEWAY has many advantages and many application scenarios. You can Official website Read on. In short, INFINI Gateway is a platform for Elasticsearch High performance application gateway, which contains rich features and is very si ...

Posted by mikem562 on Wed, 06 Apr 2022 09:54:00 +0300