Welcome to my GitHub
Here we classify and summarize all the original works of Xinchen (including supporting source code): https://github.com/zq2599/blog_demos
About hive
Hive is a data warehouse tool based on Hadoop, which maps structured data files into a database table and provides SQL like query function.
environmental information
The corresponding environmental information in this paper is as follows:
- CentOS Linux release 7.5.1804
- JDK: 1.8.0_191
- hadoop: 2.7.7
- hive: 1.2.2
hadoop deployment and startup
- For the deployment and startup of hadoop environment, please refer to "deploying hadoop 2.7.7 cluster on Linux";
- Note: make sure Hadoop is in the environment variable_ Configuration of home;
Install and configure MySQL (version 5.7.27)
- MySQL is used to store metadata. In order to simplify the operation, I deploy it in the docker environment with one line of command:
copydocker run --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=888888 -idt mysql:5.7.27
- Enter mysql container:
copydocker exec -it mysql /bin/bash
- After entering the container, connect to mysql with the password of 888888:
copymysql -h127.0.0.1 -uroot -p
- Create a new mysql account named hive:
copyCREATE USER 'hive' IDENTIFIED BY '888888';
- Authorize access to hive account (and hvie account also has the right to authorize other accounts):
copyGRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
- Refresh permission data:
copyflush privileges;
- Execute the following command on the terminal of the host to restart the mysql service:
copydocker exec mysql service mysql restart
- Enter the mysql container again and log in to mysql as hive account:
copymysql -uhive -p
- Create a database named hive:
copyCREATE DATABASE hive;
Install hive
- Download from hive's official website at: http://mirror.bit.edu.cn/apache/hive/ , select the appropriate version, as shown in the figure below:

Note: the account used for the next operation is not root, but hadoop
- In the home directory of the hadoop account, unzip the apache-hive-1.2.2-bin tar. GZ file, a directory named apache-hive-1.2.2-bin;
- Edit the of hadoop account bash_ Add an environment variable to the profile file, which is the full path of the just extracted apache-hive-1.2.2-bin folder:
copyexport HIVE_HOME=/home/hadoop/apache-hive-1.2.2-bin
- After modification, reopen an ssh connection or execute source ~ / bash_ Profile makes the environment variable take effect immediately;
- Enter the directory apache-hive-1.2.2-bin/conf/, and copy a configuration file with the template file:
copycp hive-default.xml.template hive-default.xml
- Create a directory named hive site XML file, as follows:
copy<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>888888</value> </property> </configuration>
- Put the JDBC package of MySQL in this directory: / home/hadoop/apache-hive-1.2.2-bin/lib/. I use mysql-connector-java-5.1.47 Jar, you can download here: https://download.csdn.net/download/boling_cavalry/11834367
The setting work has been completed, and the next step is startup and initialization;
Initialize and start hive
- Enter the directory apache-hive-1.2.2-bin/bin and execute the following command to initialize:
copy./schematool -initSchema -dbType mysql
After the operation is successful, the console prompts:
copy[hadoop@node0 bin]$ ./schematool -initSchema -dbType mysql Metastore connection URL: jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Starting metastore schema initialization to 1.2.0 Initialization script hive-schema-1.2.0.mysql.sql Initialization script completed schemaTool completed
- Take a look at mysql. Multiple tables are created under the hvie database:
copymysql> show tables; +---------------------------+ | Tables_in_hive | +---------------------------+ | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_TXN_COMPONENTS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | MASTER_KEYS | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | ROLES | | ROLE_MAP | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | +---------------------------+ 53 rows in set (0.00 sec)
- Execute the command in the directory / home/hadoop/apache-hive-1.2.2-bin/bin/ Hive can start;
Initialization and startup have been completed. Next, verify hive;
verification
- Execute ahead/ After hive, you have entered the conversation mode. Enter the following command to create a database named test001:
copyCREATE database test001;
- Select the database:
copyuse test001;
- Create a test_table of table:
copycreate table test_table( id INT, word STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
- Create a new ssh connection named hive_test.txt, as follows:
copy1 aaa 2 bbb 3 ccc 4 ddd 5 eee 6 fff
- Go back to the console in conversation mode with hive and enter the following command to import the contents of the above text file into test001 test_ Table:
copyLOAD DATA LOCAL INPATH '/home/hadoop/hive_test.txt' INTO TABLE test001.test_table;
The console prompts are as follows:
copyhive> LOAD DATA LOCAL INPATH '/home/hadoop/hive_test.txt' INTO TABLE test001.test_table; Loading data to table test001.test_table Table test001.test_table stats: [numFiles=1, totalSize=36] OK Time taken: 0.264 seconds
- Execute the select operation and you can see that all the data has been received:
copyhive> select * from test_table; OK 1 aaa 2 bbb 3 ccc 4 ddd 5 eee 6 fff Time taken: 0.453 seconds, Fetched: 6 row(s)
- Execute group by query:
copyselect word,count(word) from test_table GROUP BY word;
At this time, a job will be started to complete the above query. The console output is as follows:
copyhive> select word,count(word) from test_table GROUP BY word; Query ID = hadoop_20191007190528_3bd50401-267b-4d75-8b08-17ead5f0d790 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1570427946161_0002, Tracking URL = http://node0:8088/proxy/application_1570427946161_0002/ Kill Command = /home/hadoop/hadoop-2.7.7/bin/hadoop job -kill job_1570427946161_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2019-10-07 19:05:34,812 Stage-1 map = 0%, reduce = 0% 2019-10-07 19:05:39,991 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.29 sec 2019-10-07 19:05:46,201 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.23 sec MapReduce Total cumulative CPU time: 3 seconds 230 msec Ended Job = job_1570427946161_0002 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.23 sec HDFS Read: 7000 HDFS Write: 36 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 230 msec OK aaa 1 bbb 1 ccc 1 ddd 1 eee 1 fff 1 Time taken: 18.614 seconds, Fetched: 6 row(s)
So far, the installation and experience of hive have been completed. I hope this article can give some references to readers who study hive together.