Install and experience hive-1.2

Welcome to my GitHub

Here we classify and summarize all the original works of Xinchen (including supporting source code):

About hive

Hive is a data warehouse tool based on Hadoop, which maps structured data files into a database table and provides SQL like query function.

environmental information

The corresponding environmental information in this paper is as follows:

  1. CentOS Linux release 7.5.1804
  2. JDK: 1.8.0_191
  3. hadoop: 2.7.7
  4. hive: 1.2.2

hadoop deployment and startup

  1. For the deployment and startup of hadoop environment, please refer to "deploying hadoop 2.7.7 cluster on Linux";
  2. Note: make sure Hadoop is in the environment variable_ Configuration of home;

Install and configure MySQL (version 5.7.27)

  1. MySQL is used to store metadata. In order to simplify the operation, I deploy it in the docker environment with one line of command:
docker run --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=888888 -idt mysql:5.7.27
  1. Enter mysql container:
docker exec -it mysql /bin/bash
  1. After entering the container, connect to mysql with the password of 888888:
mysql -h127.0.0.1 -uroot -p
  1. Create a new mysql account named hive:
  1. Authorize access to hive account (and hvie account also has the right to authorize other accounts):
  1. Refresh permission data:
flush privileges;
  1. Execute the following command on the terminal of the host to restart the mysql service:
docker exec mysql service mysql restart
  1. Enter the mysql container again and log in to mysql as hive account:
mysql -uhive -p
  1. Create a database named hive:

Install hive

  1. Download from hive's official website at: , select the appropriate version, as shown in the figure below:

Note: the account used for the next operation is not root, but hadoop

  1. In the home directory of the hadoop account, unzip the apache-hive-1.2.2-bin tar. GZ file, a directory named apache-hive-1.2.2-bin;
  2. Edit the of hadoop account bash_ Add an environment variable to the profile file, which is the full path of the just extracted apache-hive-1.2.2-bin folder:
export HIVE_HOME=/home/hadoop/apache-hive-1.2.2-bin
  1. After modification, reopen an ssh connection or execute source ~ / bash_ Profile makes the environment variable take effect immediately;
  2. Enter the directory apache-hive-1.2.2-bin/conf/, and copy a configuration file with the template file:
cp hive-default.xml.template hive-default.xml
  1. Create a directory named hive site XML file, as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  1. Put the JDBC package of MySQL in this directory: / home/hadoop/apache-hive-1.2.2-bin/lib/. I use mysql-connector-java-5.1.47 Jar, you can download here:

The setting work has been completed, and the next step is startup and initialization;

Initialize and start hive

  1. Enter the directory apache-hive-1.2.2-bin/bin and execute the following command to initialize:
./schematool -initSchema -dbType mysql

After the operation is successful, the console prompts:

[hadoop@node0 bin]$ ./schematool -initSchema -dbType mysql
Metastore connection URL:  jdbc:mysql://
Metastore Connection Driver :  com.mysql.jdbc.Driver
Metastore connection User:  hive
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.mysql.sql
Initialization script completed
schemaTool completed
  1. Take a look at mysql. Multiple tables are created under the hvie database:
mysql> show tables;
| Tables_in_hive            |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| MASTER_KEYS               |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_KEYS            |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
53 rows in set (0.00 sec)
  1. Execute the command in the directory / home/hadoop/apache-hive-1.2.2-bin/bin/ Hive can start;

Initialization and startup have been completed. Next, verify hive;


  1. Execute ahead/ After hive, you have entered the conversation mode. Enter the following command to create a database named test001:
CREATE database test001;
  1. Select the database:
use test001;
  1. Create a test_table of table:
create table test_table(
id  INT,
word  STRING
  1. Create a new ssh connection named hive_test.txt, as follows:
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
  1. Go back to the console in conversation mode with hive and enter the following command to import the contents of the above text file into test001 test_ Table:
LOAD DATA LOCAL INPATH '/home/hadoop/hive_test.txt' INTO TABLE test001.test_table;

The console prompts are as follows:

hive> LOAD DATA LOCAL INPATH '/home/hadoop/hive_test.txt' INTO TABLE test001.test_table;
Loading data to table test001.test_table
Table test001.test_table stats: [numFiles=1, totalSize=36]
Time taken: 0.264 seconds
  1. Execute the select operation and you can see that all the data has been received:
hive> select * from test_table;
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
Time taken: 0.453 seconds, Fetched: 6 row(s)
  1. Execute group by query:
select word,count(word) from test_table GROUP BY word;

At this time, a job will be started to complete the above query. The console output is as follows:

hive> select word,count(word) from test_table GROUP BY word;
Query ID = hadoop_20191007190528_3bd50401-267b-4d75-8b08-17ead5f0d790
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1570427946161_0002, Tracking URL = http://node0:8088/proxy/application_1570427946161_0002/
Kill Command = /home/hadoop/hadoop-2.7.7/bin/hadoop job  -kill job_1570427946161_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-10-07 19:05:34,812 Stage-1 map = 0%,  reduce = 0%
2019-10-07 19:05:39,991 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.29 sec
2019-10-07 19:05:46,201 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.23 sec
MapReduce Total cumulative CPU time: 3 seconds 230 msec
Ended Job = job_1570427946161_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.23 sec   HDFS Read: 7000 HDFS Write: 36 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 230 msec
aaa 1
bbb 1
ccc 1
ddd 1
eee 1
fff 1
Time taken: 18.614 seconds, Fetched: 6 row(s)

So far, the installation and experience of hive have been completed. I hope this article can give some references to readers who study hive together.

Posted by vote-for-pedro on Mon, 09 May 2022 05:07:15 +0300