Summary of iOS learning while recording the principle of live broadcasting, from theory to practice


1, Principle and process of live broadcasting

1. Principle of a complete live broadcast app
Live broadcast principle: push the video recorded by the anchor to the server and distribute it to the audience.

Live broadcast links: streaming end (collection, beauty processing, coding, streaming), server end processing (transcoding, recording, screenshot, yellow identification), player (streaming, decoding, rendering), interactive system (chat room, gift system, like)

2. Implementation process of live broadcast app
1. Acquisition and 2 Filter treatment, 3 Code, 4 Push flow, 5 CDN distribution, 6 Pull flow, 7 Decoding, 8 Play, 9 Chat interaction

3. Live app architecture

4. Live app technology points

2, Introduction to basic knowledge of live broadcast:

1. Collect video and audio

  • 1.1 video and audio acquisition coding framework*

      AVFoundation:AVFoundation It is a framework for playing and creating real-time audio-visual media data, and provides Objective-C Interface to operate these audio-visual data, such as editing, rotation and recoding
  • 1.2 video and audio hardware equipment*

      CCD:Image sensor: it is used in the process of image acquisition and processing to convert the image into electrical signal.
      Pickup:Sound sensor: it is used in the process of sound acquisition and processing to convert sound into electrical signal.
      Audio sampling data:Usually PCM format
      Video sampling data: Usually YUV,or RGB Format, the volume of the collected original audio and video is very large, which needs to be processed by compression technology to improve the transmission efficiency

2. Video processing (beauty, watermark)

  • 2.1 video processing principle*

      Because the video finally passed GPU,Rendered frame by frame to the screen, so we can use OpenGL ES,The video frame is processed in various ways, so that the video has different effects, just like the water from a faucet, passes through several pipes, and then flows to different targets
      Now all kinds of beauty and video add special effects app All use GPUImage This framework is implemented
  • 2.2 video processing framework*
    Gpuimage: gpuimage is a powerful image / video processing framework based on OpenGL ES. It encapsulates various filters and can also write custom filters. It has built-in more than 120 common filter effects.
    OpenGL:OpenGL (Open Graphics Library) is a specification that defines a cross programming language and cross platform programming interface. It is used for three-dimensional images (two-dimensional images can also be used). OpenGL is a professional graphics program interface, which is a powerful and convenient bottom graphics library.
    OpenGL ES:OpenGL ES (OpenGL for Embedded Systems) is a subset of OpenGL 3D graphics API, which is designed for embedded devices such as mobile phones, PDA s and game consoles.

3. Video coding and decoding

  • 3.1 video coding framework*

      1.FFmpeg:Is a cross platform open source video framework,Can realize such as video coding,decode,transcoding ,Stream,Play and other rich functions. It supports a wide range of video formats and playback protocols,It includes almost all audio and video codec, packaging formats and playback protocols.
      		-Libswresample:Audio can be resampled,rematrixing And conversion of sampling format.
      		-Libavcodec:It provides a general codec framework,Contains many videos,audio frequency,Subtitle stream coding/decoder.
      		-Libavformat:Used to encapsulate video/Unpack.
      		-Libavutil:Contains some common functions,Such as random number generation,data structure,Mathematical operation, etc.
      		-Libpostproc:Used for some post-processing of video.
      		-Libswscale:For video image scaling,Color space conversion, etc.
      		-Libavfilter:Provide filter function.
      2.X264:Put the original video data YUV Coding compression into H.264 format
      3.VideoToolbox:Apple's own video hard decoding and hard coding API,But in iOS8 Before it opens.
      4.AudioToolbox:Apple's own audio hard decoding and hard coding API
  • 3.2 video coding technology*

      1.Video compression: video compression standard(video coding )Or decompression (video decoding) coding technology,such as MPEG,H.264,These video coding technologies are used to compress and encode video
      		Main function:Is to compress the video pixel data into a video code stream, so as to reduce the amount of video data. If the video is not compressed and encoded, the volume is usually very large, and a movie may cost hundreds G Space.
      		be careful:What affects the video quality most is its video coding data and audio coding data, which has little to do with the packaging format
      2.MPEG:A video compression method, which adopts inter frame compression and only stores the differences between successive frames, so as to achieve a large compression ratio
      3.H.264/AVC:A video compression method,Using prior forecasts and MPEG Medium P-B The frame prediction method is the same as frame compression, which can generate a video stream suitable for network transmission according to needs,It also has higher compression ratio and better image quality						
      		Note 1:If you compare the sharpness of a single picture, MPEG4 Have advantages; From the clarity of movement coherence, H.264 Have advantages
      		Note 2:Because the algorithm of 264 is more complex and the program implementation is cumbersome, running it requires more processor and memory resources. Therefore, the system requirements for running 264 are relatively high.
      		Note 3:Because the implementation of 264 is more flexible, it leaves some implementations to the manufacturers themselves. Although this brings many benefits to the implementation, the interoperability between different products has become a big problem, resulting in the adoption of A The data compiled by the company's encoder must pass A The company's decoder to solve such an embarrassing thing
      4.H.265/HEVC:A video compression method,be based on H.264,Retain some of the original technologies, and improve some related technologies to improve the relationship between code stream, coding quality, delay and algorithm complexity, so as to achieve the optimal setting.
      		H.265 It is a more efficient coding standard, which can compress the volume of content smaller under the same image quality effect, and save bandwidth faster when transmitting
      		I frame:(Key frame)Keep a complete picture and only need the data of this frame to complete decoding (because it contains a complete picture)
      		P frame:(Differential frame)Keep the difference between this frame and the previous frame. When decoding, you need to overlay the difference defined by this frame with the previously cached picture to generate the final picture. ( P There is no complete picture data in the frame, only the data different from the picture of the previous frame)
      		B frame:(Bidirectional differential frame)What is reserved is the difference between this frame and the previous and subsequent frames, which can be decoded B Frame, it is necessary to obtain not only the previous cached picture, but also the decoded picture, and obtain the final picture through the superposition of the front and rear pictures and the data of this frame. B The frame compression rate is high, but when decoding CPU Will be more tired
      		Intra frame( Intraframe)compress:When compressing a frame of image, only the data of this frame is considered, and the redundant information between adjacent frames is not considered,Lossy compression algorithm is generally used in the frame
      		Inter frame( Interframe)compress:Time compression( Temporal compression),It compresses by comparing the data between different frames on the timeline. Inter frame compression is generally lossless
      		muxing(Synthesis): encapsulate video stream, audio stream and even caption stream into a file(Container format( FLV,TS)),Transmitted as a signal.
  • 3.3 audio coding technology*

      AAC,mp3: These belong to audio coding technology,For compressed audio
  • 3.4 bit rate control*

      Multi bit rate:The network situation of the audience is very complex, which may be WiFi,Possible 4 G,3G,Even 2 G,So how to meet the needs of many parties? Make more lines and customize the code rate according to the current network environment.

For example, we often see 1024720, HD, SD, fluency, etc. in video playback software, which refers to various bit rates.

  • 3.5 video packaging format*

      TS : A streaming media packaging format. One advantage of streaming media packaging is that it does not need to load the index before playing, which greatly reduces the delay of first loading. If the film is relatively long, mp4 The index of files is quite large, which affects the user experience
      		Why TS:This is because of two TS Clips can be seamlessly spliced, and the player can play continuously
      FLV: A streaming media packaging format,Because the file it forms is very small and the loading speed is very fast, it is possible to watch video files on the network,therefore FLV Format has become the mainstream video format

4. Push flow

Streaming: it is to send the collected audio and video data to the streaming media server through the streaming media protocol

  • 4.1 data transmission framework*

      librtmp:Used to transmit RTMP Data in protocol format
  • 4.2 streaming media data transmission protocol*

      RTMP:Real time message transmission protocol,Adobe Systems Company for Flash The open protocol developed for audio, video and data transmission between player and server can be used because it is an open protocol.
      		RTMP The protocol is used for the transmission of objects, video and audio.
      		The agreement is based on TCP Protocol or polling HTTP Above the agreement.
      		RTMP The protocol is like a container for data packets, which can be FLV Video and audio data in. These packets can be transmitted through a single channel in the network according to the size of the fixed channel
      		chunk:Message package

5. Streaming media server

  • 5.1 common servers*

      SRS: An excellent open source streaming media server system developed by Chinese people
      BMS:It is also a streaming media server system, but it is not open source SRS Commercial version of, than SRS More functions
      nginx:Free open source web Server, commonly used to configure streaming media server.
  • 5.2 data distribution*

      CDN: (Content Delivery Network),Content distribution network,Publish the content of the website to the "edge" of the network closest to users, so that users can obtain the required content nearby and solve the problem Internet The situation of network congestion improves the response speed of users visiting the website.
      		CDN: A proxy server is equivalent to an intermediary.
      		CDN How it works: for example, request streaming media data
      				1.Upload streaming media data to the server (source station)
      				2.The source station stores streaming media data
      				3.Play streaming media to the client CDN Request encoded streaming media data
      				4.CDN In response to the request, if the streaming media data does not exist on the node, continue to request streaming media data from the source station; If the video file has been cached on the node, skip to step 6.
      				5.Origin response CDN To distribute the streaming media to the corresponding CDN On node
      				6.CDN Send streaming media data to the client
      Back to source: when a user accesses a URL If the parsed one CDN If the node does not cache the content of the response, or the cache has expired, it will go back to the source station to obtain the search. If no one visits, then CDN The node will not take the initiative to get it from the source station.
      bandwidth:The total amount of data that can be transmitted at a fixed time,
      		For example, 64 bit, 800 bit MHz The data transmission rate of the front-end bus is equal to 64 bit×800MHz÷8(Byte)=6.4GB/s
      load balancing : A server set is composed of multiple servers in a symmetrical way. Each server has an equivalent status and can provide services independently without the assistance of other servers.
      		Through some load sharing technology, the external requests are evenly distributed to a server in the symmetrical structure, and the server receiving the request responds to the customer's request independently.
      		Load balancing can evenly distribute customer requests to the server array, so as to provide rapid access to important data and solve the problem of a large number of concurrent access services.

This clustering technology can achieve performance close to that of mainframe with minimal investment.
QoS (bandwidth management): limit the bandwidth of each group to maximize the utility of the limited bandwidth

6. Pull flow
Streaming is to obtain audio and video data from the streaming media server

	Live broadcast protocol selection:
		1 Those with high requirements for immediacy or interactive needs can be adopted RTMP,RTSP
		2 It is recommended to use for playback or cross platform requirements HLS

Comparison of live broadcast protocols:

HLS: a protocol defined by Apple for real-time streaming. HLS is implemented based on HTTP protocol. The transmission content includes two parts: M3U8 description file and TS media file. It can realize live and on-demand streaming media, which is mainly used in iOS system
1. HLS realizes live broadcasting by means of on-demand technology
2. HLS is an adaptive bit rate streaming. The client will automatically select video streams with different bit rates according to the network conditions. If conditions permit, use high bit rates. When the network is busy, use low bit rates, and automatically switch between them at will. This is very helpful to ensure smooth playback when the network condition of mobile devices is unstable.
3. The implementation method is that the server provides multi bit rate video stream, and it is indicated in the list file, and the player automatically adjusts according to the playback progress and download speed.

Comparison between HLS and RTMP: HLS mainly has large delay, and RTMP has the main advantage of low delay
1. The small slice method of HLS protocol will generate a large number of files, and storing or processing these files will cause a large waste of resources
2. Compared with using RTSP protocol, the advantage is that once the segmentation is completed, the subsequent distribution process does not need to use any additional special software. Ordinary network servers can greatly reduce the configuration requirements of CDN edge servers, and any ready-made CDN can be used, while general servers rarely support RTSP.

HTTP-FLV: streaming media content based on HTTP protocol.
1. Compared with RTMP, HTTP is simpler and well-known. The content delay can also be 1 ~ 3 seconds, and the opening speed is faster, because HTTP itself has no complex state interaction. Therefore, from the perspective of latency, HTTP-FLV is better than RTMP.

RTSP: real time streaming protocol, which defines how one to many applications can effectively transmit multimedia data over IP network

RTP: real time transmission protocol. RTP is based on UDP protocol and is often used together with RTCP. It does not provide on-time transmission mechanism or other quality of service (QoS) guarantee. It relies on low-level services to realize this process.

RTCP: the supporting protocol of RTP. Its main function is to provide feedback for the quality of service (QoS) provided by RTP and collect the statistical information of relevant media connections, such as the number of transmitted bytes, the number of transmitted packets, the number of lost packets, one-way and two-way network delays, etc.

ijkplayer: an open source Android/iOS video player based on FFmpeg
*API is easy to integrate;
*The compilation configuration can be tailored to facilitate the control of the size of the installation package;
*Support hardware accelerated decoding to save more power
*Easy to use, specify the streaming URL, and automatically decode and play

7. Decoding

  • 7.1 unpacking*

      demuxing(Separation): a file synthesized from video stream, audio stream and caption stream(Container format( FLV,TS))Video, audio or subtitles are decomposed and decoded respectively.
  • 7.2 audio coding framework*

      fdk_aac:Audio codec framework, PCM Audio data and AAC Audio data interchange
  • 7.3 decoding introduction*

      Hard decoding: with GPU To decode and reduce CPU operation

Advantages: smooth playback, low power consumption, fast decoding speed,
Disadvantages: poor compatibility
Soft decoding: decoding with CPU
Advantages: good compatibility
Disadvantages: increase the burden of CPU, increase power consumption, no hard decoding, smooth decoding speed is relatively slow
8. Play
ijkplayer: an open source Android/iOS video player based on FFmpeg
API is easy to integrate;
The compilation configuration can be tailored to facilitate the control of the size of the installation package;
Support hardware accelerated decoding to save more power
Easy to use, specify the streaming URL, and automatically decode and play
9. Chat and interaction
Im: (instant messaging): instant messaging is a real-time communication system that allows two or more people to use the network to communicate text messages, files, voice and video in real time
The main function of IM in the live broadcast system is to realize the text interaction between the audience and the anchor, and between the audience and the audience
Third party SDK
Tencent cloud: the instant messaging SDK provided by Tencent can be used as a live chat room
Rongyun: a commonly used instant messaging SDK, which can be used as a live chat room

3, Understanding streaming media

Streaming media development: the network layer (socket or st) is responsible for transmission, the protocol layer (rtmp or hls) is responsible for network packaging, the packaging layer (flv, ts) is responsible for encoding and decoding data packaging, and the coding layer (h.264 and aac) is responsible for image and audio compression.
1. Frame: each frame represents a still image
2.GOP: (Group of Pictures) picture group. A GOP is a group of continuous pictures, each picture is a frame, and a GOP is a collection of many frames
The live data is actually a group of pictures, including I frame, P frame and B frame. When the user watches it for the first time, he will look for the I frame, and the player will go to the server to find the nearest I frame and feed it back to the user. Therefore, GOP Cache increases the end-to-end delay because it must get the nearest I frame
The longer the GOP Cache, the better the picture quality
3. Bit rate: the amount of data displayed per second after image compression.
4. Frame rate: the number of pictures displayed per second. Affect the picture fluency, which is directly proportional to the picture fluency: the larger the frame rate, the smoother the picture; The smaller the frame rate, the more dynamic the picture is.
Due to the special physiological structure of human eyes, if the frame rate of the picture is higher than 16, it will be considered to be coherent. This phenomenon is called visual persistence. And when the frame rate reaches a certain value and then increases, the human eye is not easy to detect that there is a significant improvement in fluency.
5. Resolution: (rectangle) the length and width of the picture, that is, the size of the picture
6. Data volume per second before compression: frame rate X resolution (unit should be several bytes)
7. Compression ratio: the amount of data per second / bit rate before compression (for the same video source and the same video coding algorithm, the higher the compression ratio, the worse the picture quality.)  
8. Video file format: suffix of the file, such as wmv,.mov,.mp4,.mp3,.avi,
Main use: according to the file format, the system will automatically determine what software to open,
Note: modifying the file format at will will not have much impact on the file itself. For example, changing avi to mp4 will still affect the file
9. Video packaging format: a container for storing video information. The streaming packaging can include TS, FLV, etc., and the index packaging can include MP4,MOV,AVI, etc,
Main function: a video file often contains images and audio, as well as some configuration information (such as the association between images and audio, how to decode them, etc.): these contents need to be organized and encapsulated according to certain rules
Note: you will find that the packaging format is the same as the file format, because the suffix of the general video file format is the name of the corresponding video packaging format, so the video file format is the video packaging format.
10. Video packaging format and video compression coding standard: just like project engineering and programming language, packaging format is a project engineering, video coding method is programming language, and a project engineering can be developed in different languages.

4, Live broadcast development summary (you can write a Demo by yourself)

Client (pull stream)
Frame: IJKPlayer

The most important element in the live broadcast is the video player. In fact, there are many choices of video players. The most famous one is the IJKPlayer of station b. In fact, many third-party players are designed according to the interface of the system's MPMoviePlayer when developers write them. Therefore, as long as you learn to use one player, the use of other players will be very easy.
Two more valuable demos are attached. One is to customize IJKPlayer, including progress bar, volume and brightness. You can refer to this demo to deeply customize IJKPlayer (other players are also applicable), such as screen gestures supported by most players (sliding to adjust volume, brightness and progress), and the other is zflayer, which is based on AVPlayer, You can mainly refer to the horizontal and vertical switching processing inside, or you can directly use it as an ordinary video player. In the application, many functions have been done well and it is very convenient to use.

The server
First, you need to find a test server or create a local Nginx server. To build a local server, see building a nginx+rtmp server on Mac

Live test address:
HLS: Phoenix Satellite TV Hong Kong channel
SMTP: rtmp:// Hong Kong Satellite TV
RSTP: rtsp:// Interactive news station 1

Anchor (streaming)
Framework: LFLiveKit

As for the streaming of live broadcast, at present, the most popular one should be LFLiveKit.

Personal imitation project

However, some personal imitation projects are completed by IJKPlayer in cooperation with LFLiveKit.

Here are some personal imitation projects. You can refer to the processing strategies of most scenes that will appear in the live broadcast.

They can be imitated by Miao, which is related to show

These two are typical implementation schemes of personal mobile live broadcast, which are common in the market.

Self realization: simulate the realization of video chat function

The LFLiveKit framework is used at the streaming end, and the IJKPlayer is used for streaming. The server finds a test server or creates a local Nginx server

Correlation diagram

5, A complete live broadcast app function

1. Chat
Private chat, chat room, lighting, push, blacklist, etc;
2. Gifts
Ordinary gifts, luxury gifts, red envelopes, ranking list, third-party recharge, internal purchase, gift dynamic update, cash withdrawal, etc;
3. Live list
List of concerned, popular, latest, classified live broadcast users, etc;
4. Live broadcast by yourself
Recording, streaming, decoding, playing, beauty, heartbeat, background switching, host to administrator operation, administrator to user, etc;
5. Room logic
Create a room, enter a room, exit a room, close a room, switch rooms, room administrator settings, room user list, etc;
6. User logic
Ordinary login, third-party login, registration, search, modify personal information, follow list, fan list, forget password, view personal information, income list, follow and access, retrieval, etc;
7. Watch live
Chat information, scroll screen, gift display, loading interface, etc;
8. Statistics
APP business statistics, third-party statistics, etc;
9. Supertube
Ban broadcasting, concealment, audit, etc;

6, SDK provided by a third party (quickly develop a complete iOS live app)

live broadcast
SDK: Jinshan cloud, qiniu cloud, Alibaba cloud, Netease cloud, Tencent cloud

Qiniu cloud: qiniu live cloud is a global live streaming service specially built for the live platform and an enterprise level live cloud service platform that realizes SDK end-to-end live scenes in one stop

  • Panda TV, Longzhu TV and other live broadcasting platforms all use qiniu cloud
    Netease video cloud: Based on professional cross platform video codec technology and large-scale video content distribution network, it provides stable, smooth, low delay and high concurrency real-time audio and video services, and can seamlessly connect the live video to its own App

Jinshan cloud SDK: their SDK is updated very quickly, and the latest version already supports https. However, there are some bug s in their SDK, but fortunately, each version of them will be repaired in time.
After comparing many SDK demos (Alibaba, Netease, Tencent, qiniu, etc.), you will find that Jinshan's SDK demo is the most well written. You can push it directly by taking it to a streaming address, including beauty, code rate, coding, etc. there are options to set on the demo. You just need to redesign the UI for these functions during development. For player demo and streaming demo, it is recommended to follow up their update release s during use. You will find that they will optimize many functions and fix many bug s every time they update

SDK: rongyun, Huanxin and wild dog

Since everyone is watching the live broadcast, there must be interaction. There must be a live chat room. We use rongyun. Because the publicity and reputation of rongyun are good, we choose rongyun, which is also a partner of many live broadcasting service providers, so we can use it at ease. Others include Huanxin and wild dogs. Huanxin's console and documents are not as friendly as rongyun. Wild dogs haven't tried. I personally recommend using rongyun. In addition, the official website of rongyun has a live studio with integrated player and chat. The demo can be used for reference. It brings a live stream of a TV station in Hong Kong, which can be used for testing rtmp:// .
Then, the processing of chat list in chat can refer to my brief book to optimize the performance .
One thing to note here is that in a controller, set the current controller as the message receiving agent of cloud melting to receive cloud melting messages.

[[RCIMClient sharedRCIMClient] setReceiveMessageDelegate:self object:nil];
	On page dealloc Don't just call [RCIMClient sharedRCIMClient] quitChatRoom It's OK to quit the live studio, because exiting the live studio is asynchronous, which may be in the current state controller dealloc If you receive a new message during this period,[RCIMClient sharedRCIMClient]Because delegate Released and caused a crash, so it should be in the current controller of dealloc Set the message receiving agent as nil. 
[[RCIMClient sharedRCIMClient] setReceiveMessageDelegate:nil object:nil];

Like animation
Like animation can refer to this , mainly through CAKeyFrameAnimation and UIBezierPath. You can also modify the code, modify the animation track, replace the like picture, etc.
bullet chat
BarrageRenderer is recommended to be used for barrage, which has good performance. The introduction of git home page can make it easy for you to use. However, if you want to make a barrage of historical messages and instant messages, it is recommended to traverse the historical barrage and bind the timeline by yourself, because the redisplay and bind the timeline method of this library are better when combined with instant messages, The display of the barrage may be repeated many times.

Network switching
In the live broadcast, we should consider the current network status of the user. When the mobile network stops playing, or switches to wifi, we should help him reconnect, so as to reduce the consumption of traffic. The change of the network is mainly judged by two ways: one is Reachability, and the other is to obtain the network status on the status bar.
Reachability is written in AppDelegate. When the network state changes, the code in the block will be called. If you want to send the message of network change to the live broadcast page, you can directly use the notification center. Then, it is suggested to use AFNetworking for reachability, because it was said in previous articles that the reachability library may not support ipv6, resulting in the rejection of audit. The reachability in AFNetworking used in our project, no problem:

- (void)monitorNetworking {
    AFNetworkReachabilityManager *mgr = [AFNetworkReachabilityManager sharedManager];

    [mgr setReachabilityStatusChangeBlock:^(AFNetworkReachabilityStatus status) {
        switch (status) {
                wifi network
                Mobile network

            case AFNetworkReachabilityStatusNotReachable:
                No network

            case AFNetworkReachabilityStatusUnknown:
                Unknown network


    //Start monitoring
    [mgr startMonitoring];

Get the network status in the status bar. Some people say that the hidden page in the status bar can't get the network status. The actual measurement can be obtained. The enumeration I wrote is included in the method. Just replace it:

- (NSString *)getCurrentNetWork {
    NSArray *subviews = [[[[UIApplication sharedApplication] valueForKeyPath:@"statusBar"] valueForKeyPath:@"foregroundView"] subviews];
    for (id child in subviews) {
        if ([child isKindOfClass:NSClassFromString(@"UIStatusBarDataNetworkItemView")]) {
            //Get status bar code
            int networkType = [[child valueForKeyPath:@"dataNetworkType"] intValue];
            switch (networkType) {
                case 0: {
                    //                    states = NetworkStatesNone;
                    return CurrentNetWorkNone;

                case 1: {
                    //                    states = NetworkStates2G;
                    return CurrentNetWorkMobile;


                case 2: {
                    //                    states = NetworkStates3G;
                    return CurrentNetWorkMobile;

                case 3: {
                    //                    states = NetworkStates4G;
                    return CurrentNetWorkMobile;

                case 5: {
                    //                    states = NetworkStatesWIFI;
                    return CurrentNetWorkWifi;

                default: {
                    return CurrentNetWorkNone;

    return CurrentNetWorkNone;

7, Performance issues

Some articles on delay Caton Optimization:

Chat Performance Optimization:

Tags: iOS xcode objective-c

Posted by Ammar on Fri, 06 May 2022 06:20:42 +0300