Hadoop fully distributed construction process (detailed)

Ready to work

  • 1.VMware 15.1
  • 2.jdk (I am 32-bit here, everyone chooses according to their own virtual machine conditions, which will be said later)
  • 3.hadoop

1 static ip settings

Static network detailed process

2 Modify the host name

vim /etc/sysconfig/network

3 Add a mapping relationship

All three virtual machines need to perform the following operations

vim /etc/hosts

Check if you can ping by ping hadoop02/1/3

Note: After the mapping relationship is configured, the three virtual machines can ping each other

4 Turn off the firewall

Permanently shut down (all three computers do this)

chkconfig iptables off


close selinux

vim /etc/selinux/config

Restart the computer to see if the firewall is turned off
The chkconfig command of centos7 and above is changed to systemctl

5 Sync time

(1) Install the ntpdate tool (all three are required)

(2) Setting synchronization with network events

(3) The system time is written into the hardware time

6 Set up ssh password-free login

Enter ssh-keygen -t rsa under the root user and press Enter all the way

After the key is generated, in the ~/.ssh/ directory, there are two files id_rsa (private key) and id_rsa.pub (public key), copy the public key to authorized_keys
Give authorized_keys600 permission

Similarly, perform the same operation on hadoop02 and hadoop03 nodes, and then copy the public key to the authorized_keys of hadoop01 on the master node

Remotely transfer the authenticated_keys on the hadoop01 node to the ~/.ssh/ directories of hadoop02 and hadoop03

scp ~/.ssh/authorized_keys root@hadoop02:~/.ssh/

Check for password-free login (password may be required for the first time)

7 Install jdk

Note: Before installing, be sure to check whether your virtual machine is 32-bit or 64-bit, do not install it wrong
It appears that x86_64 indicates that it is 64-bit. Like me, it does not appear to indicate that it is 32-bit.
Install java on three nodes and configure java environment variables

Unzip the compressed package

Add the jdk path to the /etc/profile file

source /etc/profile takes effect immediately, check the jdk version

8 Install hadoop and configure the file (the master node is OK)

1. Upload the hadoop compressed package and extract it to the module folder

Create the following directories

2. Modify the configuration file

Modify JAVA_HOME to the following content, otherwise Hadoop cannot be started easily.


Modify JAVA_HOME to the following


Modify JAVA_HOME to the following


Modify the core configuration file
1.hadoop configuration file

exist<configuration></configuration>Add something in between
<property>
        <name>fs.defaultFS</name><!--definition Hadoop HDFS middle namenode of URI and port [must be configured]-->
        <value>hdfs://hadoop01:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name><!--Hadoop Temporary storage directory at runtime [must be configured]-->
        <value>file:/opt/module/hadoop-2.7.3/tmp</value>
</property>

2.hdfs configuration file

<property><!--namenode Node metadata storage directory [must be configured]-->
    <name>dfs.namenode.name.dir</name>
    <value>file:/opt/module/hadoop-2.7.3/dfs/name</value>         
</property>
<property><!--datanode The real data storage directory [must be configured]-->
    <name>dfs.datanode.data.dir</name>
    <value>file:/opt/module/hadoop-2.7.3/dfs/data</value>         
</property>
<property><!--specify DataNode storage block number of copies,no greater than DataNode The number is enough, the default is 3 [required]-->
    <name>dfs.replication</name>
    <value>1</value> 
</property>
<property><!--specify SecondaryNamenode The working directory [must be configured]-->
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:/opt/module/hadoop-2.7.3/dfs/namesecondary</value>          
</property>
<property><!--specify SecondaryNamenode of http Protocol access address [must be configured]--> 
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop02:50090</value>
</property>
<property><!--specify SecondaryNamenode of https Protocol access address: [can not be configured]-->
    <name>dfs.namenode.secondary.https-address</name>
    <value>hadoop02:50091</value>
</property>
<property><!--must be set to true´╝îOtherwise, it will not pass web access hdfs File information on [must be configured]-->
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

3.yarn configuration file

<property>  <!--Reducer The way to obtain data [must be configured]-->
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>  <!--Reducer in the way of getting data shuffle The class corresponding to the process can be customized, [can not be configured], which is the default-->
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property> <!--ResourceManager hostname, other after configuration address There is no need to configure, unless you need to customize the port [must be configured]-->
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop01</value>
</property>
<property><!--NodeManager The memory size of the node, in units of MB[must be configured]-->
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>
<!-- Log aggregation function [no configuration required temporarily] --> <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value> 
</property>
<!-- The log retention time is set to 7 days [temporary configuration is not required]--> <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
</property>

4.MapReduce configuration file
But this file does not exist, copy it first and then open it, use the cp command to make a copy, don't create it yourself

<property><!--use yarn run mapreduce Program [must be configured]-->
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property> <!--Configure history server [no configuration needed temporarily]--> <property><!--MapReduce JobHistory Server address-->
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop01:10020</value>
</property>
<property><!--MapReduce JobHistory Server Web interface address-->
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop01:19888</value>
</property>

5.slaves file

3. File distribution


4. Set environment variables

Each node edits the /etc/profile file

export HADOOP_HOME=/opt/module/hadoop-2.7.3 
export HADOOP_MAPRED_HOME=$HADOOP_HOME    
export HADOOP_COMMON_HOME=$HADOOP_HOME    
export HADOOP_HDFS_HOME=$HADOOP_HOME    
export YARN_HOME=$HADOOP_HOME    
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native    
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin


Regenerate /etc/profile to see if Hadoop is installed successfully

10 Start a Hadoop cluster

1. Format Hadoop


2 Start Hadoop

Start the HDFSNameNode and Datanode in hdfs using the following command on the specified node

1. Start HDFS

Start on hadoop01: sbin/start-dfs.sh (should be NameNode on this machine)


2. Start YARN

Start on hadoop01: sbin/start-yarn.sh (should be on ResourceManagerde machine)


3. Access through port 50070

Tags: Hadoop

Posted by MattDunbar on Sat, 14 May 2022 12:02:34 +0300