Data warehouse development often needs to deal with data tables, so after the data warehouse table development is completed, everything will be fine? Obviously not, you also need to think about how to analyze the data and how to present the data, because this is an important aspect of the data value. Through data analysis and visual presentatio ...
Posted by zero-one on Tue, 24 May 2022 23:31:42 +0300
Data skew is the most common problem in data development, and it is also a question that must be asked in interviews. So why is the data skewed? When will data skew occur? and how to solve it?
What is data skew: The essence of data skew is uneven data distribution. Some tasks process a large amount of data, which leads to a longe ...
Posted by Paris! on Fri, 13 May 2022 21:01:37 +0300
premise The demand related to data prediction is received for the first time. It can be said that it has something to do with "data mining", rather than the conventional data retrieval, query and export of duplicate disk. Therefore, it is still more motivated and interested to try to realize this demand. Because I haven't systematical ...
Posted by aboyd on Thu, 12 May 2022 15:10:28 +0300
Using Hive to build offline data warehouse is a very common scheme in enterprises. Although Hive's usage scenario is to process big data through batch processing, it is usually insensitive to processing time. However, in the case of limited resources, we need to pay attention to Hive's performance tuning to facilitate the rapid output of data. ...
Posted by Greaser9780 on Wed, 11 May 2022 06:27:37 +0300
-- Create a database in HDFS The default path on is/user/hive/warehouse/*.db
create database mydatabase;
-- have access to if exists Determine whether the database already exists(If it exists, it will not be created)
create database if not exists mydatabase;
-- Create a database and specify its storage path
Posted by varasurf on Thu, 05 May 2022 06:52:27 +0300
For the deployment test conducted on the Ubuntu 18 version of the native simulator, refer to the official document:
hadoop: Link address
hive: Link address
The whole process is configured with the root account.
hadoop installation configuration
hadoop uses a virtual cluster, that is, a singl ...
Posted by anshu.sah on Thu, 05 May 2022 05:22:18 +0300
Shallow SQL boys may only know that after pyspark constructs the sparkSession object [of course, enableHiveSupport], write an SQL:
spark.sql("write a SQL string here");
Then spark will complete various operations of select ing, querying data, inserting and overwriting data into the result table according to the SQL here. ...
Posted by Fearless_Fish on Wed, 04 May 2022 16:16:50 +0300
Window functions are a special set of functions
Scan multiple input lines to calculate each output value, and generate a row structure for each row of data
Complex calculation and aggregation can be realized through window functions
By function, it can be divided into: sequence (sorting), aggregation and analys ...
Posted by fishdish on Mon, 02 May 2022 05:28:04 +0300
Remark: Hive version 2.1.1
1. Overview of Hive SELECT (Data Query Language)
The select statement is the most frequently used statement in Hive, and it is also the statement with the most complex syntax. Many syntaxes of select statements are similar to traditional relational databases, which also facilitates the transition from tradition ...
Posted by sarika on Sat, 30 Apr 2022 08:55:28 +0300
This article comes from: https://www.cnblogs.com/cheyunhua/p/10037162.html
0. Prepare the installation package
The system image, big data software installation package and development environment software installation package required in this article can be downloaded from my baidu cloud disk.Link: System image and various big data softwarePass ...
Posted by louisp on Mon, 25 Apr 2022 11:19:12 +0300