Basic operation of cloud computing and big data experiment three HDFS

1. Purpose of the experiment

  1. Understand HDFS architecture and working principle

  2. Master HDFS deployment environment and steps

  3. Master HDFS (cluster startup start-dfs.sh) startup

  4. Use Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system

2. Experimental content

  1. HDFS pseudo-distributed environment construction

  2. HDFS (cluster startup start-dfs.sh) starts

  3. Practice Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system

3. Experimental steps

Use the tar decompression command to decompress the downloaded hadoop installation package.

Execution process and results:

1. Enter the package directory

root@evassh-10644553:~# cd /data/workspace/myshixun/ 
root@evassh-10644553:/data/workspace/myshixun#

2. View the software package (you can see the jdk installation package through the LS command)

root@evassh-10644553:/data/workspace/myshixun# ls hadoop-2.8.3.tar.gz 
root@evassh-10644553:/data/workspace/myshixun#

3. Decompress the software package to the /opt directory (the tar command is the decompression command, and the -C parameter specifies the decompression location)

root@evassh-10644553:/data/workspace/myshixun# tar -zxf hadoop-2.7.1.tar.gz -C /opt 
root@evassh-10644553:/data/workspace/myshixun#

4. Check whether the decompression is successful

root@evassh-10644553:/data/workspace/myshixun# ls /opt hadoop-2.8.3 
root@evassh-10644553:/data/workspace/myshixun# 

5. Change the directory to the root user's home directory

root@evassh-10644553:/data/workspace/myshixun# cd 
root@evassh-10644553:~#

Configure environment variables

The purpose of configuring environment variables is to be able to use related commands such as hadoop or hdfs globally.

1. Use the vi command to edit the environment variable file

root@evassh-10644553:~# vi /etc/profile

After entering the command, it will enter the inside of the document, as shown in the figure below

2. Press the ↓ arrow to move the white cursor to the bottom, as shown by the red icon below

3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited

4. Complete the configuration according to the content entered in the red box in the figure below

5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone

6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file

After pressing Enter, you can see that you have exited the file editor

7. Effective environment editing

root@evassh-10644553:~#source /etc/profile root@evassh-10644553:~#

8. Test, after entering the letter h, quickly press the TAB key on the keyboard, and the following results will be returned

root@evassh-10644553:~# h

root@evassh-10644553:~# h 

From the above returned results, we can see that there are many commands starting with hadoop and hdfs. If there are no commands starting with hadoop and hdfs after TAB, the environment variable configuration is wrong.

Modify the core-site.xml file of HDFS

The core-site.xml file mainly specifies that the default file system is the node where HDFS and Namenode are located.

1. Edit core-site.xml

root@evassh-10644553:~# vi /opt/hadoop-2.7.1/etc/hadoop/core-site.xml

After entering the command, it will enter the inside of the document, as shown in the figure

2. Press the ↓ arrow to move the white cursor to the bottom, as shown by the red icon below

3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited

4. Complete the configuration according to the content entered in the red box in the figure below

Be sure to check the content again and again, otherwise an error will be reported later

5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone

6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file

After pressing Enter, you can see that you have exited the file editor

Modify the hdfs-site.xml file of HDFS

The hdfs-site.xml file mainly specifies the metadata storage directory, the data storage directory, and the backup Namenode node.

1. Edit hdfs-site.xml

root@evassh-10644553:~# vi /opt/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

After entering the command, it will enter the inside of the document, as shown in the figure below

2. Press the ↓ arrow to move the white cursor to the red icon shown below

3. After ensuring that the current input method is in English, press the lowercase i key, and the --INSERT-- character will appear in the red part of the following icon after pressing, indicating that the document has entered the document editing mode and the document can be edited

4. Complete the configuration according to the content entered in the red box in the figure below Be sure to check the content again and again, otherwise an error will be reported later

5. After the input is complete, press the esc key on the keyboard to exit the editing mode. After pressing it, you can see that the --INSERT-- character is gone

6. After ensuring that the current input method is in English, enter: wq to save the file and exit the file After pressing Enter, you can see that you have exited the file editor

Initialize the cluster

The so-called initialization cluster is to format and generate a file system. The main purpose is to:

①Create a new metadata directory

② Generate a file fsimage that records metadata

③Generate the relevant identification of the cluster: such as cluster ID—clusterID

root@evassh-10644553:~# hadoop namenode -format

Seeing "successfully" in the returned result indicates that the initialization is successful. After successful initialization, do not operate again. Each initialization will generate a new cluster ID, which will make the cluster IDs recorded in DataNode and NameNode inconsistent, and the two cannot be identified.

ssh password-free configuration

ssh is one of the ways to connect to the linux host. When HDFS-related services are started, a new link will be created to connect to the linux host. Password-free login needs to be configured, so that the service can be started directly without entering a password.

1. Generate a key, press Enter three times in a row

root@evassh-10644553:~# ssh-keygen -t rsa -P ''
root@evassh-10644553:~#

2. Add id_rsa.pub to the authorized key

root@evassh-10644553:~#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@evassh-10644553:~#

3. Test

root@evassh-10644553:~#ssh localhost 

Enter yes in the place marked in red below After the input is completed, there is no password prompt, it is successful

Start HDFS and simply view

1. Use the start-dfs.sh command to start the HDFS cluster.

root@evassh-10644553:~# start-dfs.sh localhost: 
starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-namenode-evassh-10683023.out localhost: 
starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-evassh-10683023.out Starting secondary namenodes [localhost] localhost: 
starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-secondarynamenode-evassh-10683023.out 
root@evassh-10644553:~#

2. Use the JPS command to verify

root@evassh-10644553:~#jps 1328 SecondaryNameNode 979 NameNode 1126 DataNode 1608 Jps 

The number in front is the process number of the service, and the process number will be different every time it is started. As long as you can see that the three processes of NameNode, DataNode, and SecondaryNameNode are online

3. Use the ls command to view the files on hdfs

root@evassh-10644553:~#hdfs dfs -ls / 
root@evassh-10644553:~#

It is normal if the return result is empty.

Common commands of HDFS

Start Hadoop

Create /usr/output/ folder in HDFS;

Create a hello.txt file locally and add the content: "HDFS blocks are larger than disk blocks, and the purpose is to minimize addressing overhead.";

Upload hello.txt to the /usr/output/ directory of HDFS;

Delete the /user/hadoop directory of HDFS;

Copy the file hello.txt on Hadoop from HDFS to the local /usr/local directory.

4. Experimental experience

Mastered HDFS (cluster startup start-dfs.sh) startup

Will use Hadoop commands (file addition/deletion/modification/check/upload/download) to operate the distributed file system

Tags: Big Data Distribution cloud computing hdfs

Posted by gaspower on Sat, 31 Dec 2022 06:51:29 +0300