1. Experiment Description
- Install Spark cluster in spark Standalone run mode
- Experiment time:
- 45 minutes
- The main steps:
- Unzip and install Spark
- Add Spark configuration file
- Start the Spark cluster
- run test cases
2. Experimental environment
- Number of virtual machines: 3 (one master and two slaves, the host names are: master, slave01, slave02)
- System version: Centos 7.5
- Hadoop version: Apache Hadoop 2.7.3
- Spark version: Apache Spark 2.1.1
3. Relevant skills
- Spark Standalone installation and deployment
4. Knowledge points
- The use of common linux commands
- Configure spark by modifying the .bash_profile file
- Verify spark standalone installation
- Submit the application to the cluster to run
- The use of spark webui
5. Realize the effect
The final result of running the Calculate Pi example is as follows:
6. Experimental steps
Prerequisite: It has been successfully installed and deployed in the cluster Hadoop cluster
6.1 Decompress the spark archive on the master node
6.1.1 Open the linux command line terminal (right-click on the desktop and select "Open Terminal")
6.1.2 In the command line terminal, switch to the directory /home/zkpk/tgz/spark where the spark archive is located
[zkpk@master ~]$ cd /home/zkpk/tgz/spark
6.1.3 Unzip the spark archive to the user's root directory
[zkpk@master spark]$ tar -xzvf spark-2.1.1-bin-hadoop2.7.tgz -C /home/zkpk
6.2 View the contents of the decompressed spark directory
6.2.1 Return to the user's desired directory
[zkpk@master spark]$ cd
6.2.2 Enter the decompressed spark directory
[zkpk@master ~]$ cd spark-2.1.1-bin-hadoop2.7/
6.2.3 View the contents of this directory
[zkpk@master spark-2.1.1-bin-hadoop2.7]$ ll
6.3 Configuring environment variables
6.3.1 Fall back to the user root directory
[zkpk@master spark-2.1.1-bin-hadoop2.7]$ cd
6.3.2vim edit .bash_profile file
[zkpk@master ~]$ vim .bash_profile
6.3.3 Add spark related information, then save and exit
export SPARK_HOME=/home/zkpk/spark-2.1.1-bin-hadoop2.7export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
6.3.4 Run the source command and recompile .bash_profile to make the added variables take effect
[zkpk@master ~]$ source ~/.bash_profile
6.3.5slave01 and slave02 are also configured as above
6.3.5.1 Copy the .bash_profile file to the /home/zkpk directory of slave01 and slave02 respectively
[zkpk@master ~]$ cd [zkpk@master ~]$ scp .bash_profile slave01:/home/zkpk[zkpk@master ~]$ scp .bash_profile slave02:/home/zkpk
6.3.5.2 source to make changes effective
[zkpk@master ~]$ ssh slave01 #Remote login slave01[zkpk@slave01 ~]$ source .bash_profile[zkpk@slave01 ~]$ exit #log out remotely[zkpk@master ~]$ ssh slave02 #Remote login slave02[zkpk@slave02 ~]$ source .bash_profile[zkpk@slave02 ~]$ exit #log out remotely
6.4 Modify the slaves file
6.4.1 Enter the spark configuration file directory
[zkpk@master ~]$ cd spark-2.1.1-bin-hadoop2.7/conf/
6.4.2 Rename the slaves.template file in the conf directory to slaves
[zkpk@master conf]$ mv slaves.template slaves
6.4.3 Replace the original content of slaves with the following content, and save and exit
slave01slave02
6.5 Modify the spark-env.sh file in the conf directory
6.5.1 Rename the file spark-env.sh.template to spark-env.sh
[zkpk@master conf]$ mv spark-env.sh.template spark-env.sh
6.5.2 Add the following content to the spark-env.sh file, and save and exit
export SPARK_MASTER_HOST=master #set to run master process node export SPARK_MASTER_PORT=7077 #set up master communication port export SPARK_WORKER_CORES=1 #each worker Number of cores used export SPARK_WORKER_MEMORY=1024M #each worker memory size used export SPARK_MASTER_WEBUI_PORT=8080 #master of webui port export SPARK_CONF_DIR=/home/zkpk/spark-2.1.1-bin-hadoop2.7/conf #spark config file directory export JAVA_HOME=/usr/java/jdk1.8.0_131/ #jdk installation path
6.6 Remotely copy the configured spark home directory to the other two slave nodes
6.6.1 Switch to user root directory
[zkpk@master conf]$ cd
6.6.2 Remote copy spark home directory to slave01
[zkpk@master ~]$ scp -r spark-2.1.1-bin-hadoop2.7/ zkpk@slave01:/home/zkpk
6.6.3 Remote copy spark home directory to slave02
[zkpk@master ~]$ scp -r spark-2.1.1-bin-hadoop2.7/ zkpk@slave02:/home/zkpk
6.7 Start the spark cluster
6.7.1 Enter the sbin directory of spark on the master node
[zkpk@master ~]$ cd spark-2.1.1-bin-hadoop2.7/sbin/
6.7.2 Run start-all.sh to start the spark cluster
[zkpk@master sbin]$ ./start-all.sh
6.7.3 Verify that spark standalone mode is deployed correctly
6.7.3.1 Open a browser to access the spark webui interface address: http://master:8080 , an interface similar to the following figure appears
6.7.3.2 Submit job to spark cluster from command line
[zkpk@master sbin]$ cd spark-2.1.1-bin-hadoop2.7/bin/[zkpk@master bin]$./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/jars/spark-examples_2.11-2.1.1.jar 10
Calculate the result of pi, as shown below: Pi is roughly 3.1418271418271417
6.7.3.3 So far, the installation and deployment of spark standalone mode is successful
7. Summary
spark
The installation and deployment of standalone involves each node in the cluster. The basic steps are to decompress the compressed package on the master node, configure the spark slaves, spark-env.sh file, configure the .bash_profile file, and remotely copy the spark directory and .bash_profile to other From the node, start the spark cluster on the master node to spark
An example of running a spark calculation pi in standalone mode