Data prediction -- the thought and practice of user loss income prediction

premise The demand related to data prediction is received for the first time. It can be said that it has something to do with "data mining", rather than the conventional data retrieval, query and export of duplicate disk. Therefore, it is still more motivated and interested to try to realize this demand. Because I haven't systematical ...

Posted by aboyd on Thu, 12 May 2022 15:10:28 +0300

Hadoop environment construction

Hadoop environment construction The download problem of hadoo and jdk: if it is downloaded to windows, it needs to be moved to the virtual machine. Just drag and drop to move files. If the drag and drop fails, you need to use remote connection software to complete the file upload, which is recommended here. Installation and use of MobaXterm: ht ...

Posted by PaulRyan on Thu, 12 May 2022 13:43:09 +0300

Flink learning from 0 to 1 - Chapter 5 Flink stream processing API

1. Flink stream processing API 1.1 Environment 1.1.1 getExecutionEnvironment Create an execution environment that represents the context of the current executing program. If the program is called independently, this method returns to the local execution environment; If the program is called from the command-line client to submit to the clu ...

Posted by abhishekphp6 on Wed, 11 May 2022 16:00:44 +0300

Oracle - Summary of the use of Mybatis and sql optimization

preface Before writing, this blog will be updated constantly to sort out and summarize the sql optimization problems in the development work (actual combat process); Because the author works in the big data Department, the amount of data is basically very large, so he pays more attention to the performance under big data (but for some oper ...

Posted by AustinP on Wed, 11 May 2022 13:43:15 +0300

Hive Performance Tuning Guide

Using Hive to build offline data warehouse is a very common scheme in enterprises. Although Hive's usage scenario is to process big data through batch processing, it is usually insensitive to processing time. However, in the case of limited resources, we need to pay attention to Hive's performance tuning to facilitate the rapid output of data. ...

Posted by Greaser9780 on Wed, 11 May 2022 06:27:37 +0300

Data warehouse 001 - full backup of time zipper

Demo in mysql Full backup of zipper during mysql demonstration: 1, Demonstrate the first and second full backups. 1. First, there is the business system table. //Enter the test database. use test //Create business table user_data and give the initial business data. create table user_data( userid int, ustate varchar(4) ); insert into user_da ...

Posted by Mark.P.W on Tue, 10 May 2022 06:57:38 +0300

Hadoop based environment construction

Hadoop based environment construction Environment construction Local environment The single node mode that can be used for development has no concept of distribution Pseudo distributed environment A node represents an entire cluster, but in the concept of distribution, a node acts as all the roles in the distribution Fully distributed The ...

Posted by englishman69 on Mon, 09 May 2022 23:40:28 +0300

Spark core components, operation architecture and RDD creation

Spark core components Before explaining the Spark architecture, let's first understand several core components of Spark and find out their functions. 1. Application:Spark application User programs built on Spark, including Driver code and code running in the Executor of each node of the cluster. It consists of one or more Job jobs. As show ...

Posted by Someone789 on Sun, 08 May 2022 23:43:36 +0300

Taobao crawler, data analysis, children's wear

requests crawler Now Taobao needs to log in to retrieve goods, so you need to get cookies from your account. In chrome, F12 is a simple packet capture plug-in. Select the network tag, check the header of the package one by one, and find that the package beginning with search needs my cookie Copy the circle part and convert it into diction ...

Posted by tieded on Sun, 08 May 2022 06:19:59 +0300

Detailed steps for setting up a cluster

zookeeper cluster construction 1, Cluster planning Install three virtual machines. The IP address and host name are set as follows IP host name Software 192.168.1.66 SQG JDK\zookeeper 192.168.1.2 hadoop1 JDK\zookeeper 192.168.1.3 hadoop2 JDK\zookeeper 2, Environment preparation (the following three virtual machines need to be ope ...

Posted by MentalMonkey on Sat, 07 May 2022 23:48:20 +0300