BI data analysis methods that you need to know about data warehouse development

Data warehouse development often needs to deal with data tables, so after the data warehouse table development is completed, everything will be fine? Obviously not, you also need to think about how to analyze the data and how to present the data, because this is an important aspect of the data value. Through data analysis and visual presentatio ...

Posted by zero-one on Tue, 24 May 2022 23:31:42 +0300

The only way for data development - data tilt

foreword Data skew is the most common problem in data development, and it is also a question that must be asked in interviews. So why is the data skewed? When will data skew occur? and how to solve it? What is data skew: The essence of data skew is uneven data distribution. Some tasks process a large amount of data, which leads to a longe ...

Posted by Paris! on Fri, 13 May 2022 21:01:37 +0300

Data prediction -- the thought and practice of user loss income prediction

premise The demand related to data prediction is received for the first time. It can be said that it has something to do with "data mining", rather than the conventional data retrieval, query and export of duplicate disk. Therefore, it is still more motivated and interested to try to realize this demand. Because I haven't systematical ...

Posted by aboyd on Thu, 12 May 2022 15:10:28 +0300

Hive Performance Tuning Guide

Using Hive to build offline data warehouse is a very common scheme in enterprises. Although Hive's usage scenario is to process big data through batch processing, it is usually insensitive to processing time. However, in the case of limited resources, we need to pay attention to Hive's performance tuning to facilitate the rapid output of data. ...

Posted by Greaser9780 on Wed, 11 May 2022 06:27:37 +0300

Hive SQL statement

Library operation Create database -- Create a database in HDFS The default path on is/user/hive/warehouse/*.db create database mydatabase; -- have access to if exists Determine whether the database already exists(If it exists, it will not be created) create database if not exists mydatabase; -- Create a database and specify its storage path c ...

Posted by varasurf on Thu, 05 May 2022 06:52:27 +0300

hadoop+hive notes on deploying stand-alone version

preface For the deployment test conducted on the Ubuntu 18 version of the native simulator, refer to the official document: hadoop: Link address hive: Link address Version used: hadoop: 3.2.1 hive: 3.1.2 The whole process is configured with the root account. hadoop installation configuration hadoop uses a virtual cluster, that is, a singl ...

Posted by anshu.sah on Thu, 05 May 2022 05:22:18 +0300

Secondary development Spark enables JDBC to read Hive data of remote tenant cluster and implement Hive2Hive data integration of Hive in this cluster [Java]

background Shallow SQL boys may only know that after pyspark constructs the sparkSession object [of course, enableHiveSupport], write an SQL: spark.sql("write a SQL string here"); Then spark will complete various operations of select ing, querying data, inserting and overwriting data into the result table according to the SQL here. ...

Posted by Fearless_Fish on Wed, 04 May 2022 16:16:50 +0300

Hive common window functions

1, Overview 1. Definition Window functions are a special set of functions Scan multiple input lines to calculate each output value, and generate a row structure for each row of data Complex calculation and aggregation can be realized through window functions By function, it can be divided into: sequence (sorting), aggregation and analys ...

Posted by fishdish on Mon, 02 May 2022 05:28:04 +0300

Hive chapter of big data development 5-Hive data query language

Remark: Hive version 2.1.1 1. Overview of Hive SELECT (Data Query Language) The select statement is the most frequently used statement in Hive, and it is also the statement with the most complex syntax. Many syntaxes of select statements are similar to traditional relational databases, which also facilitates the transition from tradition ...

Posted by sarika on Sat, 30 Apr 2022 08:55:28 +0300

Hadoop+HBase+Spark+Hive environment construction

This article comes from: https://www.cnblogs.com/cheyunhua/p/10037162.html 0. Prepare the installation package The system image, big data software installation package and development environment software installation package required in this article can be downloaded from my baidu cloud disk.Link: System image and various big data softwarePass ...

Posted by louisp on Mon, 25 Apr 2022 11:19:12 +0300