hive study notes eight: Sqoop

Welcome to my GitHub

All original works of Xinchen (including supporting source code) are classified and summarized here: https://github.com/zq2599/blog_demos

"hive study notes" series navigation

  1. basic data type
  2. complex data types
  3. Internal and external tables
  4. Partition Table
  5. bucket
  6. HiveQL Basics
  7. built-in function
  8. Sqoop
  9. Basic UDF
  10. User Defined Aggregate Function (UDAF)
  11. UDTF

About Sqoop

Sqoop is an Apache open source project for efficiently transferring large amounts of data between Hadoop and relational databases. This article will practice the following with you:

  1. Deploy Sqoop
  2. Use Sqoop to export hive table data to MySQL
  3. Import MySQL data to hive table with Sqoop

deploy

  1. Download Sqoop version 1.4.7 from the home directory of the hadoop account:
wget https://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
copy
  1. Unzip:
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
copy
  1. After decompression, get the folder sqoop-1.4.7.bin__hadoop-2.6.0, copy mysql-connector-java-5.1.47.jar to sqoop-1.4.7.bin__hadoop-2.6.0/lib directory
  2. Enter the directory sqoop-1.4.7.bin__hadoop-2.6.0/conf and rename sqoop-env-template.sh to sqoop-env.sh:
mv sqoop-env-template.sh sqoop-env.sh
copy
  1. Open sqoop-env.sh with an editor, add the following three configurations, HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME are the complete hadoop path, HIVE_HOME is the complete hive path:
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.7
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.7
export HIVE_HOME=/home/hadoop/apache-hive-1.2.2-bin
copy
  1. The installation and configuration are completed, enter sqoop-1.4.7.bin__hadoop-2.6.0/bin, execute ./sqoop version to check the sqoop version, as shown below, it can be seen that it is version 1.4.7 (some environment variables will output an alarm if not configured, Ignore it for now):
[hadoop@node0 bin]$ ./sqoop version
Warning: /home/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/11/02 12:02:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
copy
  • After sqoop is installed, experience its functions next

MySQL ready

For the next actual combat, you need to prepare MySQL. Here is the configuration of MySQL for your reference:

  1. MySQL version: 5.7.29
  2. MySQL server IP: 192.168.50.43
  3. MySQL service port: 3306
  4. Account: root
  5. Password: 123456
  6. Database name: sqoop

Regarding MySQL deployment, I deployed it with docker in order to save trouble, refer to "Synology DS218+ deploy mysql"

Import MySQL from hive (export)

  • Execute the following command to import hive data into MySQL:
./sqoop export \
--connect jdbc:mysql://192.168.50.43:3306/sqoop \
--table address \
--username root \
--password 123456 \
--export-dir '/user/hive/warehouse/address' \
--fields-terminated-by ','
copy
  • Looking at the address table, the data has been imported:

Import hive from MySQL (import)

  1. Execute the following statement in the command line mode of hive, and create a new table named address2 with the same structure as address:
create table address2 (addressid int, province string, city string) 
row format delimited 
fields terminated by ',';
copy
  1. Execute the following command to import the data of the address table of MySQL into the address2 table of hive. -m 2 means to start two map tasks:
./sqoop import \
--connect jdbc:mysql://192.168.50.43:3306/sqoop \
--table address \
--username root \
--password 123456 \
--target-dir '/user/hive/warehouse/address2' \
-m 2
copy
  1. After execution, console input similar to the following:
		Virtual memory (bytes) snapshot=4169867264
		Total committed heap usage (bytes)=121765888
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=94
20/11/02 16:09:22 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 16.8683 seconds (5.5726 bytes/sec)
20/11/02 16:09:22 INFO mapreduce.ImportJobBase: Retrieved 5 records.
copy
  1. To view the address2 table of hive, it can be seen that the data has been successfully imported:
hive> select * from address2;
OK
1	guangdong	guangzhou
2	guangdong	shenzhen
3	shanxi	xian
4	shanxi	hanzhong
6	jiangshu	nanjing
Time taken: 0.049 seconds, Fetched: 5 row(s)
copy
  • So far, the deployment and basic operations of the Sqoop tool have been completed. I hope this article can give you some reference when performing data import and export operations.

Posted by todd2006 on Fri, 06 May 2022 10:36:24 +0300