Ready to work
- 1.VMware 15.1
- 2.jdk (I am 32-bit here, everyone chooses according to their own virtual machine conditions, which will be said later)
- 3.hadoop
1 static ip settings
Static network detailed process
2 Modify the host name
vim /etc/sysconfig/network
3 Add a mapping relationship
All three virtual machines need to perform the following operations
vim /etc/hosts
Check if you can ping by ping hadoop02/1/3
Note: After the mapping relationship is configured, the three virtual machines can ping each other
4 Turn off the firewall
Permanently shut down (all three computers do this)
chkconfig iptables off
close selinux
vim /etc/selinux/config
Restart the computer to see if the firewall is turned off
The chkconfig command of centos7 and above is changed to systemctl
5 Sync time
(1) Install the ntpdate tool (all three are required)
(2) Setting synchronization with network events
(3) The system time is written into the hardware time
6 Set up ssh password-free login
Enter ssh-keygen -t rsa under the root user and press Enter all the way
After the key is generated, in the ~/.ssh/ directory, there are two files id_rsa (private key) and id_rsa.pub (public key), copy the public key to authorized_keys
Give authorized_keys600 permission
Similarly, perform the same operation on hadoop02 and hadoop03 nodes, and then copy the public key to the authorized_keys of hadoop01 on the master node
Remotely transfer the authenticated_keys on the hadoop01 node to the ~/.ssh/ directories of hadoop02 and hadoop03
scp ~/.ssh/authorized_keys root@hadoop02:~/.ssh/
Check for password-free login (password may be required for the first time)
7 Install jdk
Note: Before installing, be sure to check whether your virtual machine is 32-bit or 64-bit, do not install it wrong
It appears that x86_64 indicates that it is 64-bit. Like me, it does not appear to indicate that it is 32-bit.
Install java on three nodes and configure java environment variables
Unzip the compressed package
Add the jdk path to the /etc/profile file
source /etc/profile takes effect immediately, check the jdk version
8 Install hadoop and configure the file (the master node is OK)
1. Upload the hadoop compressed package and extract it to the module folder
Create the following directories
2. Modify the configuration file
Modify JAVA_HOME to the following content, otherwise Hadoop cannot be started easily.
Modify JAVA_HOME to the following
Modify JAVA_HOME to the following
Modify the core configuration file
1.hadoop configuration file
exist<configuration></configuration>Add something in between <property> <name>fs.defaultFS</name><!--definition Hadoop HDFS middle namenode of URI and port [must be configured]--> <value>hdfs://hadoop01:9000</value> </property> <property> <name>hadoop.tmp.dir</name><!--Hadoop Temporary storage directory at runtime [must be configured]--> <value>file:/opt/module/hadoop-2.7.3/tmp</value> </property>
2.hdfs configuration file
<property><!--namenode Node metadata storage directory [must be configured]--> <name>dfs.namenode.name.dir</name> <value>file:/opt/module/hadoop-2.7.3/dfs/name</value> </property> <property><!--datanode The real data storage directory [must be configured]--> <name>dfs.datanode.data.dir</name> <value>file:/opt/module/hadoop-2.7.3/dfs/data</value> </property> <property><!--specify DataNode storage block number of copies,no greater than DataNode The number is enough, the default is 3 [required]--> <name>dfs.replication</name> <value>1</value> </property> <property><!--specify SecondaryNamenode The working directory [must be configured]--> <name>dfs.namenode.checkpoint.dir</name> <value>file:/opt/module/hadoop-2.7.3/dfs/namesecondary</value> </property> <property><!--specify SecondaryNamenode of http Protocol access address [must be configured]--> <name>dfs.namenode.secondary.http-address</name> <value>hadoop02:50090</value> </property> <property><!--specify SecondaryNamenode of https Protocol access address: [can not be configured]--> <name>dfs.namenode.secondary.https-address</name> <value>hadoop02:50091</value> </property> <property><!--must be set to true,Otherwise, it will not pass web access hdfs File information on [must be configured]--> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
3.yarn configuration file
<property> <!--Reducer The way to obtain data [must be configured]--> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <!--Reducer in the way of getting data shuffle The class corresponding to the process can be customized, [can not be configured], which is the default--> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <!--ResourceManager hostname, other after configuration address There is no need to configure, unless you need to customize the port [must be configured]--> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> <property><!--NodeManager The memory size of the node, in units of MB[must be configured]--> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <!-- Log aggregation function [no configuration required temporarily] --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- The log retention time is set to 7 days [temporary configuration is not required]--> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
4.MapReduce configuration file
But this file does not exist, copy it first and then open it, use the cp command to make a copy, don't create it yourself
<property><!--use yarn run mapreduce Program [must be configured]--> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--Configure history server [no configuration needed temporarily]--> <property><!--MapReduce JobHistory Server address--> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <property><!--MapReduce JobHistory Server Web interface address--> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property>
5.slaves file
3. File distribution
4. Set environment variables
Each node edits the /etc/profile file
export HADOOP_HOME=/opt/module/hadoop-2.7.3 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Regenerate /etc/profile to see if Hadoop is installed successfully
10 Start a Hadoop cluster
1. Format Hadoop
2 Start Hadoop
Start the HDFSNameNode and Datanode in hdfs using the following command on the specified node
1. Start HDFS
Start on hadoop01: sbin/start-dfs.sh (should be NameNode on this machine)
2. Start YARN
Start on hadoop01: sbin/start-yarn.sh (should be on ResourceManagerde machine)