Hive Performance Tuning Guide

Using Hive to build offline data warehouse is a very common scheme in enterprises. Although Hive's usage scenario is to process big data through batch processing, it is usually insensitive to processing time. However, in the case of limited resources, we need to pay attention to Hive's performance tuning to facilitate the rapid output of data. ...

Posted by Greaser9780 on Wed, 11 May 2022 06:27:37 +0300

Data warehouse 001 - full backup of time zipper

Demo in mysql Full backup of zipper during mysql demonstration: 1, Demonstrate the first and second full backups. 1. First, there is the business system table. //Enter the test database. use test //Create business table user_data and give the initial business data. create table user_data( userid int, ustate varchar(4) ); insert into user_da ...

Posted by Mark.P.W on Tue, 10 May 2022 06:57:38 +0300

Hadoop based environment construction

Hadoop based environment construction Environment construction Local environment The single node mode that can be used for development has no concept of distribution Pseudo distributed environment A node represents an entire cluster, but in the concept of distribution, a node acts as all the roles in the distribution Fully distributed The ...

Posted by englishman69 on Mon, 09 May 2022 23:40:28 +0300

Spark core components, operation architecture and RDD creation

Spark core components Before explaining the Spark architecture, let's first understand several core components of Spark and find out their functions. 1. Application:Spark application User programs built on Spark, including Driver code and code running in the Executor of each node of the cluster. It consists of one or more Job jobs. As show ...

Posted by Someone789 on Sun, 08 May 2022 23:43:36 +0300

Taobao crawler, data analysis, children's wear

requests crawler Now Taobao needs to log in to retrieve goods, so you need to get cookies from your account. In chrome, F12 is a simple packet capture plug-in. Select the network tag, check the header of the package one by one, and find that the package beginning with search needs my cookie Copy the circle part and convert it into diction ...

Posted by tieded on Sun, 08 May 2022 06:19:59 +0300

Detailed steps for setting up a cluster

zookeeper cluster construction 1, Cluster planning Install three virtual machines. The IP address and host name are set as follows IP host name Software 192.168.1.66 SQG JDK\zookeeper 192.168.1.2 hadoop1 JDK\zookeeper 192.168.1.3 hadoop2 JDK\zookeeper 2, Environment preparation (the following three virtual machines need to be ope ...

Posted by MentalMonkey on Sat, 07 May 2022 23:48:20 +0300

Climb to station b video comment user information! These comments are the great God!

Recently, Mr. Ma Baoguo is very popular in station b. the video playback volume of him is very high. The video comment area of station b is full of talents who speak well. Write a crawler to crawl the user information and comment content in the comment area of station b. 1, Preparatory work 1. Tools (1) Chrome Google browser installation add ...

Posted by Hiro on Sat, 07 May 2022 07:36:40 +0300

A bug triggered a contest between hair. I won an overwhelming victory. Can you keep your hair?

I met a very strange problem while debugging the company's project a few days ago. Today I'll study it here Follow official account: Java architects alliance, update technical articles every day scene As a PDA interface is newly added to the company to query the historical parking flow data, I first check the qualified data from the database ...

Posted by Sangre on Fri, 06 May 2022 20:42:28 +0300

elasticsearch cold and hot separation Cluster Construction -- the road of building a dream

Introduction to cold heat separation architecture Hot and cold separation is a very popular architecture of ES at present. It makes full use of the advantages and disadvantages of cluster machines to realize resource scheduling and allocation. The index writing and query speed of ES cluster mainly depends on the IO speed of disk. The key point ...

Posted by coldkill on Fri, 06 May 2022 13:23:41 +0300

Hadoop learning from 0 to 1 -- Chapter 12 Hadoop data compression

1. Compression overview Compressed computing can effectively reduce the number of read and write sections of the underlying storage system. Compression improves the efficiency of network bandwidth and disk space. When running MR program, I/O operation, network transmission, Shuffle and Merge take a lot of time, especially in the case of la ...

Posted by frkmilla on Fri, 06 May 2022 07:30:28 +0300