News recommendation practice: the manufacture of recommendation system process

This article mainly explains the construction of the recommendation system process, which mainly includes two parts: Offline and Online.OfflineThe offline part mainly performs offline calculation based on the previously stored material portraits and user portraits, and provides each user with a list of popular pages and recommended pages and ca ...

Posted by unklematt on Fri, 11 Nov 2022 12:48:10 +0300

201_Spark installation and deployment: Standalone mode

1. Experiment Description Install Spark cluster in spark Standalone run modeExperiment time: 45 minutesThe main steps: Unzip and install SparkAdd Spark configuration fileStart the Spark clusterrun test cases 2. Experimental environment Number of virtual machines: 3 (one master and two slaves, the host names are: master, slave01, slave02 ...

Posted by mnick on Sat, 08 Oct 2022 00:16:10 +0300

SPark study notes: 14 Window operations of Spark Stream

window overview of Spark Dstream Spark DStream provides Window operations, and we can use Window operators to perform a series of operator operations on the data. Unlike Flink, the window operations provided by Spark DStream are relatively simple. Operations can only be performed based on the processing time of the data. Spark's wind ...

Posted by zrosen88 on Wed, 07 Sep 2022 08:42:23 +0300

c language implementation of queue

Sequential implementation of queues typedef struct{ int data[MaxSize]; int front,rear; } SqQueue; For the initialization of the sequence queue, front points to the queue head element, and rear points to the next position of the queue tail element, that is, the next position that should be inserted. void Init(SqQueue q){ q.front=0 ...

Posted by natbrazil on Fri, 01 Jul 2022 21:46:43 +0300

Big data practice: flink e-commerce user behavior analysis, real-time hot commodity statistics

1. Introduction The first thing to realize is real-time popular commodity statistics. We will analyze it based on UserBehavior data set. The main body of the project is written in Scala, using IDEA as the development environment for project writing, and maven as the project construction and management tool. First, we need to build the project f ...

Posted by The Chancer on Tue, 24 May 2022 09:56:15 +0300

spark learning -- spark SQL

Spark SQL Spark SQL is a module that Spark uses to process structured data. It provides a programming abstraction called DataFrame and acts as a distributed SQL query engine. In Hive, Hive SQL is converted into MapReduce and then submitted to the cluster for execution, which greatly simplifies the complexity of writing MapReduce programs, bec ...

Posted by SoaringUSAEagle on Mon, 23 May 2022 18:39:54 +0300

Big data acquisition case: Python web crawler instance

Web crawler: Web crawler (also known as web page) spider , network robot, in FOAF In the middle of the community, more often referred to as web page chaser), it is a kind of automatic crawling according to certain rules web A program or script that contains information. Other names that are not often used are Ants , automatic indexing, emulator ...

Posted by lobobr on Mon, 23 May 2022 13:35:16 +0300

CDH6.3.1 - installation steps

CDH6.3.1 - installation steps Note - all host passwords should be consistent Prepare installation package MySQL5.7 mysql-5.7.27-1.el7.x86_64.rpm-bundle.tar MySQL driver package mysql-connector-java.jar Cloudera manager package cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm cloud ...

Posted by jjmusicpro on Mon, 23 May 2022 10:17:39 +0300

ZooKeeper configuration under Hadoop cluster

Install zookeeper environment zookeeper installation package: https://pan.baidu.com/s/1fpdBs8kbjPj5rlrwusv1iw Extraction code: h1wv jdk environment to be prepared: Reference: https://blog.csdn.net/weixin_44147632/article/details/107796624 Decompression: tar -zxf zookeeper-3.4.5-cdh5 14.2. tar. gz -C /opt/bigdata/hadoop/ Renamed: MV zookeeper- ...

Posted by madmindz on Sun, 22 May 2022 16:24:13 +0300

Correctly use Flink broadcast stream and record the failure of Flink to make checkpoints

Recently, when I was working on a project, I was involved in such a scenario that a relatively small table that will not be changed often should be used as a dimension table to match with the real-time flow. This table is a table in MySQL. My first reaction is to read this table for broadcasting. Inelegant use of broadcast streams The brief c ...

Posted by omanush on Sat, 21 May 2022 23:43:47 +0300