Hadoop environment construction
The download problem of hadoo and jdk: if it is downloaded to windows, it needs to be moved to the virtual machine. Just drag and drop to move files. If the drag and drop fails, you need to use remote connection software to complete the file upload, which is recommended here. Installation and use of MobaXterm: https://www.cnblogs.com/cainiao-chuanqi/p/11366726.html
After uploading, the simple/soft and other folders need to be created by yourself (the command to create a directory is mkdir xxx, there is no hard rule for creation here, it is up to your own preferences, but the premise is that you can find the directory folder you created in there Location.
The setting of environment variables needs to accurately locate the absolute paths of the JDK and Hadoop storage directories.
The IP address involved in this article is the IP address of this virtual machine. When you do experiments, you need to use the IP address of your own virtual machine (see: ifconfig)
JDK installation and configuration
The first is the choice of JDK, it is recommended to choose JDK1.8 version. Prevent compatibility issues. Because a Hadoop installation process will call many jar packages. (Hadoop itself is written in Java).
Download JDK
Download to a folder, here the /home/cai/simple/soft folder is selected. Copy the downloaded JDK to this folder. (Go to the Internet to download the Linux version of the jdk). ps: Use cp or mv to move files.
cd /home/cai/simple/soft
Unzip the JDK
If you downloaded a zip file, you will need to unzip the file. tar -zxvf /home/cai/simple/soft/jdk-8u181-linux-x64.tar.gz
tar -zxvf /home/cai/simple/soft/jdk-8u181-linux-x64.tar.gz
Enter the JDK directory
cd /home/cai/simple/soft/ jdk1.8.0_181
Here my JDK directory is in
cd /home/cai/simple/soft/
Here you choose according to your file location. . Make sure the file is unzipped correctly. .
Configure the JDK environment
vim /etc/profile
#java environment export JAVA_HOME=/home/cai/simple/jdk1.8.0_181 export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar export PATH=$PATH:${JAVA_HOME}/bin export HADOOP_HOME=/home/cai/simple/soft/hadoop-2.7.1
Update configuration file
After editing, execute source /etc/profile to refresh the configuration, and the files in the configuration will take effect.
source /etc/profile
Test if the configuration is successful
Execute the javac command in any directory. If it prompts "command not found", it means that the configuration is unsuccessful; otherwise, it means that the configuration is successful.
javac
Hadoop installation and configuration
Download Hadoop
Download the file to /home/cai/simple/soft/, (choose the download location by yourself). ps: Use cp or mv to move files.
cd /home/cai/simple/soft/
Unzip Hadoop
tar -zxvf /home/cai/simple/soft/hadoop-2.7.1.tar.gz
View Hadoop's etc file
First check whether the decompression is successful. If the decompression is successful, enter the hadoop-2.7.1 folder.
View the files in the /home/cai/simple/soft/hadoop-2.7.1/etc/hadoop directory
cd /home/cai/simple/soft/hadoop-2.7.1/etc/hadoop
View configuration files
Configure the hadoop-env.sh file under $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh
# The java implementation to use. #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/home/cai/simple/jdk1.8.0_181
Configure the core-site.xml file under $HADOOP_HOME/etc/hadoop
vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- HDFS file path --> <property> <name>fs.default.name</name> <value>hdfs://172.16.12.37:9000</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://172.16.12.37:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/cai/simple/soft/hadoop-2.7.1/tmp</value> <description>Abasefor other temporary directories.</description> </property> </configuration>
Configure the hdfs-site.xml file under $HADOOP_HOME/etc/hadoop
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributeid on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and ldoop , it is necessary to conf Three files in the directory for configuration imitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/cai/simple/soft/hadoop-2.7.1/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/cai/simple/soft/hadoop-2.7.1/hdfs/data</value> </property> <!-- <property> <name>dfs.namenode.name.dir</name> <value>/home/cai/simple/soft/hadoop-2.7.1/etc/hadoop /hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/cai/simple/soft/hadoop-2.7.1/etc/hadoop /hdfs/data</value> </property> --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
Configure the mapred-site.xml file under $HADOOP_HOME/etc/hadoop
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>172.16.12.37:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>172.16.12.37:19888</value> </property> </configuration> </configuration>
Configure the yarn-site.xml file under $HADOOP_HOME/etc/hadoop
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>172.16.12.37:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>172.16.12.37:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>172.16.12.37:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>172.16.12.37:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>172.16.12.37:8088</value> </property> </configuration>
Configure /etc/profile file
vim /etc/profile
# /etc/profile
# System wide environment and startup programs, for login setup
# Functions and aliases go in /etc/bashrc
# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.
pathmunge () {
case ":${PATH}:" in
*:"$1":*)
;;
*)
if [ "$2" = "after" ] ; then
PATH=$PATH:$1
else
PATH=$1:$PATH
fi
esac
}
if [ -x /usr/bin/id ]; then
if [ -z "$EUID" ]; then
# ksh workaround
EUID=`id -u`
UID=`id -ru`
fi
USER="`id -un`"
LOGNAME=$USER
MAIL="/var/spool/mail/$USER"
fi
# Path manipulation
if [ "$EUID" = "0" ]; then
pathmunge /usr/sbin
pathmunge /usr/local/sbin
else
pathmunge /usr/local/sbin after
pathmunge /usr/sbin after
fi
HOSTNAME=`/usr/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
export HISTCONTROL=ignoreboth
else
export HISTCONTROL=ignoredups
fi
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then
umask 002
else
umask 022
fi
for i in /etc/profile.d/*.sh ; do
if [ -r "$i" ]; then
if [ "${-#*i}" != "$-" ]; then
. "$i"
else
. "$i" >/dev/null
fi
fi
done
unset i
unset -f pathmunge
#java environment
export JAVA_HOME=/home/cai/simple/jdk1.8.0_181
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin
export HADOOP_HOME=/home/cai/simple/soft/hadoop-2.7.1
export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
Update configuration file
To make the configuration file take effect, you need to execute the command source /etc/profile
source /etc/profile
Format NameNode
To format the NameNode, execute hdfs namenode -format or hadoop namenode -format in any directory to format.
hdfs namenode -format
or
hadoop namenode -format
Start the Hadoop cluster
To start the Hadoop process, first execute the command start-dfs.sh to start the HDFS system.
start-dfs.sh
start yarn cluster
start-yarn.sh
jps view configuration information
jps
UI testing
There are two ways to test HDFS and yarn (recommended to use the Firefox browser), one is to open in the command line, the other is to open directly by double-clicking
firefox
Ports: 8088 and 50070 ports
First, enter http://172.16.12.37:50070/ (HDFS management interface) in the browser (this IP is the IP address of your own virtual machine, and the port is a fixed port). Everyone's IP is different, according to their own IP address. Certainly. . .
Enter http://172.16.12.37:8088/ (MR management interface) in the browser (this IP is the IP address of your own virtual machine, and the port is a fixed port). Everyone's IP is different, and it is determined according to your own IP address. . . .