1. Purpose of the experiment
-
Understand HDFS architecture and working principle
-
Master HDFS deployment environment and steps
-
Master HDFS (cluster startup start-dfs.sh) startup
-
Use Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system
2. Experimental content
-
HDFS pseudo-distributed environment construction
-
HDFS (cluster startup start-dfs.sh) starts
-
Practice Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system
3. Experimental steps
Use the tar decompression command to decompress the downloaded hadoop installation package.
Execution process and results:
1. Enter the package directory
root@evassh-10644553:~# cd /data/workspace/myshixun/ root@evassh-10644553:/data/workspace/myshixun#
2. View the software package (you can see the jdk installation package through the LS command)
root@evassh-10644553:/data/workspace/myshixun# ls hadoop-2.8.3.tar.gz root@evassh-10644553:/data/workspace/myshixun#
3. Decompress the software package to the /opt directory (the tar command is the decompression command, and the -C parameter specifies the decompression location)
root@evassh-10644553:/data/workspace/myshixun# tar -zxf hadoop-2.7.1.tar.gz -C /opt root@evassh-10644553:/data/workspace/myshixun#
4. Check whether the decompression is successful
root@evassh-10644553:/data/workspace/myshixun# ls /opt hadoop-2.8.3 root@evassh-10644553:/data/workspace/myshixun#
5. Change the directory to the root user's home directory
root@evassh-10644553:/data/workspace/myshixun# cd root@evassh-10644553:~#
Configure environment variables
The purpose of configuring environment variables is to be able to use related commands such as hadoop or hdfs globally.
1. Use the vi command to edit the environment variable file
root@evassh-10644553:~# vi /etc/profile
After entering the command, it will enter the inside of the document, as shown in the figure below
2. Press the ↓ arrow to move the white cursor to the bottom, as shown by the red icon below
3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited
4. Complete the configuration according to the content entered in the red box in the figure below
5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone
6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file
After pressing Enter, you can see that you have exited the file editor
7. Effective environment editing
root@evassh-10644553:~#source /etc/profile root@evassh-10644553:~#
8. Test, after entering the letter h, quickly press the TAB key on the keyboard, and the following results will be returned
root@evassh-10644553:~# h
root@evassh-10644553:~# h
From the above returned results, we can see that there are many commands starting with hadoop and hdfs. If there are no commands starting with hadoop and hdfs after TAB, the environment variable configuration is wrong.
Modify the core-site.xml file of HDFS
The core-site.xml file mainly specifies that the default file system is the node where HDFS and Namenode are located.
1. Edit core-site.xml
root@evassh-10644553:~# vi /opt/hadoop-2.7.1/etc/hadoop/core-site.xml
After entering the command, it will enter the inside of the document, as shown in the figure
2. Press the ↓ arrow to move the white cursor to the bottom, as shown by the red icon below
3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited
4. Complete the configuration according to the content entered in the red box in the figure below
Be sure to check the content again and again, otherwise an error will be reported later
5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone
6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file
After pressing Enter, you can see that you have exited the file editor
Modify the hdfs-site.xml file of HDFS
The hdfs-site.xml file mainly specifies the metadata storage directory, the data storage directory, and the backup Namenode node.
1. Edit hdfs-site.xml
root@evassh-10644553:~# vi /opt/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
After entering the command, it will enter the inside of the document, as shown in the figure below
2. Press the ↓ arrow to move the white cursor to the red icon shown below
3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited
4. Complete the configuration according to the content entered in the red box in the figure below Be sure to check the content again and again, otherwise an error will be reported later
5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone
6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file After pressing Enter, you can see that you have exited the file editor
Initialize the cluster
The so-called initialization cluster is to format and generate a file system. The main purpose is to:
①Create a new metadata directory
② Generate a file fsimage that records metadata
③Generate the relevant identification of the cluster: such as cluster ID—clusterID
root@evassh-10644553:~# hadoop namenode -format
Seeing "successfully" in the returned result indicates that the initialization is successful. After successful initialization, do not operate again. Each initialization will generate a new cluster ID, which will make the cluster IDs recorded in DataNode and NameNode inconsistent, and the two cannot be identified.
ssh password-free configuration
ssh is one of the ways to connect to the linux host. When HDFS-related services are started, a new link will be created to connect to the linux host. Password-free login needs to be configured, so that the service can be started directly without entering a password.
1. Generate a key, press Enter three times in a row
root@evassh-10644553:~# ssh-keygen -t rsa -P '' root@evassh-10644553:~#
2. Add id_rsa.pub to the authorized key
root@evassh-10644553:~#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys root@evassh-10644553:~#
3. Test
root@evassh-10644553:~#ssh localhost
Enter yes in the place marked in red below After the input is completed, there is no password prompt, it is successful
Start HDFS and simply view
1. Use the start-dfs.sh command to start the HDFS cluster.
root@evassh-10644553:~# start-dfs.sh localhost: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-namenode-evassh-10683023.out localhost: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-evassh-10683023.out Starting secondary namenodes [localhost] localhost: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-secondarynamenode-evassh-10683023.out root@evassh-10644553:~#
2. Use the JPS command to verify
root@evassh-10644553:~#jps 1328 SecondaryNameNode 979 NameNode 1126 DataNode 1608 Jps
The number in front is the process number of the service, and the process number will be different every time it is started. As long as you can see that the three processes of NameNode, DataNode, and SecondaryNameNode are online
3. Use the ls command to view the files on hdfs
root@evassh-10644553:~#hdfs dfs -ls / root@evassh-10644553:~#
It is normal if the return result is empty.
Common commands of HDFS
Start Hadoop
Create /usr/output/ folder in HDFS;
Create a hello.txt file locally and add the content: "HDFS blocks are larger than disk blocks, and the purpose is to minimize addressing overhead.";
Upload hello.txt to the /usr/output/ directory of HDFS;
Delete the /user/hadoop directory of HDFS;
Copy the file hello.txt on Hadoop from HDFS to the local /usr/local directory.
4. Experimental experience
Mastered HDFS (cluster startup start-dfs.sh) startup
Will use Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system