vivo Internet big data team Lv Jia
Hadoop 3. The first stable version of X was released at the end of 2017, with many significant improvements.
In terms of HDFS, it supports new features such as error coding, More than 2 NameNodes, router based Federation, Standby NameNode Read, FairCallQueue and intra datanode balancer. These new features bri ...
Posted by mybluehair on Mon, 16 May 2022 05:16:17 +0300
Overview: the last article generally received the overall concept and process of source code analysis of flycheckpoint. Combined with the code, it introduces the initiation and task execution process of checkpoint Detailed reference: https://blog.csdn.net/weixin_40809627/article/details/108537480
This article will follow the previous article a ...
Posted by freddyw on Sun, 15 May 2022 15:26:01 +0300
For the maintenance and management of Linux system, its performance monitoring is very important, especially the real-time monitoring data. This data is helpful for us to judge the load pressure of the server, adjust the resource allocation in time, and better serve the business. So today, the migrant worker brother gives you a Linux performanc ...
(please leave a message for advice on the inadequacies of the first learning experience)
Environment tools (VMware, MobaXterm, CentOS7, JDK8, Hadoop 2.9.2)
The required tool environment can be extracted in this link Link: https://pan.baidu.com/s/1VCXtS6fm6YvHMtBFrgNx6Q Extraction code: xpgu 1, VMware installation The i ...
Posted by CrusaderSean on Sun, 15 May 2022 07:54:17 +0300
Alink ramble (22): cluster evaluation of source code analysis
Alink is a new generation of machine learning algorithm platform developed by Alibaba based on Flink, a real-time computing engine. It is the first machine learning platform in the industry to support batch algorithm and streaming algorithm at the same time. This a ...
Posted by chenci on Sat, 14 May 2022 16:31:35 +0300
CentOS 7 installs hadoop, configures eclipse and hdfs file system interfaces - run case tests
I have written three blogs before. This is the last and most important step. Today, let's talk about eclipse docking hadoop
Please read the first three blogs before reading this blog hadoop,eclipse and jdk have been installed here
Install Hadoop ecl ...
Posted by deregular on Sat, 14 May 2022 05:14:39 +0300
Data skew is the most common problem in data development, and it is also a question that must be asked in interviews. So why is the data skewed? When will data skew occur? and how to solve it?
What is data skew: The essence of data skew is uneven data distribution. Some tasks process a large amount of data, which leads to a longe ...
Posted by Paris! on Fri, 13 May 2022 21:01:37 +0300
The purpose of serialization is to reduce the load of the network Serialization technology:
java built-in serialization
Relatively speaking, java serialization is worse than google protobuf and
Download protocol-2.5.0-win32 zip
Posted by Xster on Fri, 13 May 2022 05:58:34 +0300
For the source code, see: https://github.com/hiszm/hadoop-train
User behavior log overview
Records of each search and click by the user
Historical behavior data, from historical orders
==>Then make recommendations / so as to improve the conversion of users (the ultimate goal)
20979872853^Ahttp://www.yihaodian. com/1/? typ ...