Big Data Analysis - Matplotlib Introduction (not yet completed)

This tutorial is just a tour of the basic methods used by Maplotlib In the next chapter, we'll take an advanced tutorial Problem area 1. Why use plt. Gcf(). Set_ PLT after facecolor (np.ones(3)* 240/255). Figure will fail. 2. Matplotlib. Introduction to pyplot Matplotlib is Python's drawing library. It works with NumPy, providing an ef ...

Posted by Muppet9010 on Sun, 27 Mar 2022 20:15:51 +0300

Hive Integrated Spark Tutorial (Hive on Spark)

Introduction to Hive Engine Hive engines include: default MR, tez, spark The bottom engine is MR (Mapreduce), which doesn't need to be configured. Hive runs with it Hive on Tez configuration: https://blog.csdn.net/weixin_45417821/article/details/115181000 Hive on Spark: Hive is responsible for both storing metadata and parsing and optim ...

Posted by r4ck4 on Sun, 27 Mar 2022 20:02:20 +0300

Data warehouse | COUNT DISTINCT data tilt optimization

What is data skew Data skew is very common in the MapReduce programming model. It is that a large number of the same key s are assigned to a partition, resulting in very slow running of individual tasks, which affects the execution efficiency of the whole task. The root cause of data skew is that the amount of data processed by a few workers ...

Posted by dazzclub on Sun, 27 Mar 2022 11:33:48 +0300

Spark GraphX Programming Guide

Spark series interview questionsSpark interview question (I)Spark interview questions (II)Spark interview questions (III)Spark interview questions (IV)Spark interview question (V) -- data skew tuningSpark interview question (VI) -- spark resource optimizationSpark interview question (VII) -- Spark Program Development and optimizationSpark inter ...

Posted by alecks on Sun, 27 Mar 2022 11:02:36 +0300