Hadoop environment construction

Hadoop environment construction

The download problem of hadoo and jdk: if it is downloaded to windows, it needs to be moved to the virtual machine. Just drag and drop to move files. If the drag and drop fails, you need to use remote connection software to complete the file upload, which is recommended here. Installation and use of MobaXterm: https://www.cnblogs.com/cainiao-chuanqi/p/11366726.html

After uploading, the simple/soft and other folders need to be created by yourself (the command to create a directory is mkdir xxx, there is no hard rule for creation here, it is up to your own preferences, but the premise is that you can find the directory folder you created in there Location.

The setting of environment variables needs to accurately locate the absolute paths of the JDK and Hadoop storage directories.

The IP address involved in this article is the IP address of this virtual machine. When you do experiments, you need to use the IP address of your own virtual machine (see: ifconfig)

JDK installation and configuration

The first is the choice of JDK, it is recommended to choose JDK1.8 version. Prevent compatibility issues. Because a Hadoop installation process will call many jar packages. (Hadoop itself is written in Java).

Download JDK

Download to a folder, here the /home/cai/simple/soft folder is selected. Copy the downloaded JDK to this folder. (Go to the Internet to download the Linux version of the jdk). ps: Use cp or mv to move files.

 cd /home/cai/simple/soft

Unzip the JDK

If you downloaded a zip file, you will need to unzip the file. tar -zxvf /home/cai/simple/soft/jdk-8u181-linux-x64.tar.gz

tar -zxvf /home/cai/simple/soft/jdk-8u181-linux-x64.tar.gz

Enter the JDK directory

 cd /home/cai/simple/soft/ jdk1.8.0_181

Here my JDK directory is in

 cd /home/cai/simple/soft/ 

Here you choose according to your file location. . Make sure the file is unzipped correctly. .

Configure the JDK environment

vim /etc/profile

#java environment
  export JAVA_HOME=/home/cai/simple/jdk1.8.0_181
  export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
  export PATH=$PATH:${JAVA_HOME}/bin


  export HADOOP_HOME=/home/cai/simple/soft/hadoop-2.7.1

Update configuration file

After editing, execute source /etc/profile to refresh the configuration, and the files in the configuration will take effect.

source /etc/profile

Test if the configuration is successful

Execute the javac command in any directory. If it prompts "command not found", it means that the configuration is unsuccessful; otherwise, it means that the configuration is successful.

javac

Hadoop installation and configuration

Download Hadoop

Download the file to /home/cai/simple/soft/, (choose the download location by yourself). ps: Use cp or mv to move files.

cd /home/cai/simple/soft/ 

Unzip Hadoop

tar -zxvf /home/cai/simple/soft/hadoop-2.7.1.tar.gz

View Hadoop's etc file

First check whether the decompression is successful. If the decompression is successful, enter the hadoop-2.7.1 folder.

View the files in the /home/cai/simple/soft/hadoop-2.7.1/etc/hadoop directory

cd /home/cai/simple/soft/hadoop-2.7.1/etc/hadoop

View configuration files

Configure the hadoop-env.sh file under $HADOOP_HOME/etc/hadoop

vim hadoop-env.sh 

 

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/cai/simple/jdk1.8.0_181

Configure the core-site.xml file under $HADOOP_HOME/etc/hadoop

vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- HDFS file path -->
<property>
  <name>fs.default.name</name>
  <value>hdfs://172.16.12.37:9000</value>
 </property>

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://172.16.12.37:9000</value>
 </property>

 <property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
 </property>

 <property>
  <name>hadoop.tmp.dir</name>
  <value>/home/cai/simple/soft/hadoop-2.7.1/tmp</value>
  <description>Abasefor other temporary directories.</description>
 </property>


</configuration>

             

Configure the hdfs-site.xml file under $HADOOP_HOME/etc/hadoop

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributeid on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  ldoop , it is necessary to conf Three files in the directory for configuration imitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/cai/simple/soft/hadoop-2.7.1/hdfs/name</value>
 </property>

 <property>
  <name>dfs.datanode.data.dir</name>
    <value>/home/cai/simple/soft/hadoop-2.7.1/hdfs/data</value>
  </property>

<!--
  <property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/cai/simple/soft/hadoop-2.7.1/etc/hadoop
/hdfs/name</value>
 </property>

 <property>
  <name>dfs.datanode.data.dir</name>
    <value>/home/cai/simple/soft/hadoop-2.7.1/etc/hadoop
/hdfs/data</value>
  </property>
-->

 <property>
  <name>dfs.replication</name>
  <value>1</value>
 </property>

 <property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
 </property>


</configuration>

Configure the mapred-site.xml file under $HADOOP_HOME/etc/hadoop

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>

<configuration>
  <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>

 <property>
  <name>mapreduce.jobhistory.address</name>
  <value>172.16.12.37:10020</value>
 </property>

 <property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>172.16.12.37:19888</value>
 </property>

</configuration>

</configuration>

Configure the yarn-site.xml file under $HADOOP_HOME/etc/hadoop

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>

  <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

 <property>
   <name>yarn.resourcemanager.address</name>
   <value>172.16.12.37:8032</value>
  </property>

  <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>172.16.12.37:8030</value>
  </property>

  <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>172.16.12.37:8035</value>
  </property>

 <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>172.16.12.37:8033</value>
  </property>

  <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>172.16.12.37:8088</value>
  </property>

</configuration>

Configure /etc/profile file

 

 

vim /etc/profile

# /etc/profile

# System wide environment and startup programs, for login setup
# Functions and aliases go in /etc/bashrc

# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

pathmunge () {
    case ":${PATH}:" in
        *:"$1":*)
            ;;
        *)
            if [ "$2" = "after" ] ; then
                PATH=$PATH:$1
            else
                PATH=$1:$PATH
            fi
    esac
}

if [ -x /usr/bin/id ]; then
    if [ -z "$EUID" ]; then
        # ksh workaround
        EUID=`id -u`
        UID=`id -ru`
    fi
    USER="`id -un`"
    LOGNAME=$USER
    MAIL="/var/spool/mail/$USER"
fi

# Path manipulation
if [ "$EUID" = "0" ]; then
    pathmunge /usr/sbin
    pathmunge /usr/local/sbin
else
    pathmunge /usr/local/sbin after
    pathmunge /usr/sbin after
fi
HOSTNAME=`/usr/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL=ignoredups
fi

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL

# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then
    umask 002
else
    umask 022
fi

for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmunge

#java environment
  export JAVA_HOME=/home/cai/simple/jdk1.8.0_181
  export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
  export PATH=$PATH:${JAVA_HOME}/bin


  export HADOOP_HOME=/home/cai/simple/soft/hadoop-2.7.1
  export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

Update configuration file

To make the configuration file take effect, you need to execute the command source /etc/profile

source /etc/profile

Format NameNode

To format the NameNode, execute hdfs namenode -format or hadoop namenode -format in any directory to format.

hdfs namenode -format 
or
hadoop namenode -format 

Start the Hadoop cluster

To start the Hadoop process, first execute the command start-dfs.sh to start the HDFS system.

start-dfs.sh

start yarn cluster

start-yarn.sh

jps view configuration information

jps

UI testing

There are two ways to test HDFS and yarn (recommended to use the Firefox browser), one is to open in the command line, the other is to open directly by double-clicking

firefox

Ports: 8088 and 50070 ports

First, enter http://172.16.12.37:50070/ (HDFS management interface) in the browser (this IP is the IP address of your own virtual machine, and the port is a fixed port). Everyone's IP is different, according to their own IP address. Certainly. . .

Enter http://172.16.12.37:8088/ (MR management interface) in the browser (this IP is the IP address of your own virtual machine, and the port is a fixed port). Everyone's IP is different, and it is determined according to your own IP address. . . .

Tags: Linux Big Data Software

Posted by PaulRyan on Thu, 12 May 2022 13:43:09 +0300