This article was first published on wechat official account: byte flow
Serial of FFmpeg Development Series:
FFmpeg development (01): FFmpeg compilation and integration
FFmpeg development (02): FFmpeg + ANativeWindow to realize video decoding and playback
FFmpeg development (03): FFmpeg + OpenSLES realize audio decoding and playback
This paper realizes audio decoding and playback based on the previous article FFmpeg + OpenSLES. FFmpeg is used to decode the audio stream of an Mp4 file, and then resample the decoded PCM audio data. Finally, while playing with OpenSLES, the data of one channel of PCM audio is rendered into a histogram in real time.
As for audio visualization, in the old article, we used to render the real-time audio single channel PCM data collected by Android AudioRecorder into a column diagram with OpenGL. For specific rendering process and details, please move to this article. The code is open source:
Visual real-time audio with OpenGL ES
Extract audio data of a channel
In the previous article, when we built OpenSLES player, we defined the data format as follows:
SLDataFormat_PCM pcm = { SL_DATAFORMAT_PCM,//format type (SLuint32)2,//channel count dual channel SL_SAMPLINGRATE_44_1,//44100hz SL_PCMSAMPLEFORMAT_FIXED_16,// bits per sample 2 bytes = 16bit SL_PCMSAMPLEFORMAT_FIXED_16,// container size SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT,// channel mask SL_BYTEORDER_LITTLEENDIAN // endianness small end sequence };
As can be seen from the above code, the sampling rate of PCM data received by audio driver is 44.1kHz, dual channel, and the sampling size is 2 bytes. Since we want to render PCM data of one channel, we need to extract the data of two channels.
As shown in the above figure, the decoded PCM data is the data of two channels cross stored. When the pointer offset is used to extract the data of a channel, the step size of each offset is 2 bytes X the number of channels = 4 bytes.
The way to extract PCM data of a channel is as follows. In this way, we can separate a frame of audio data from the data of each channel.
//Audio data stored in small end sequence uint8_t* pByte = audioFrame->data; for(int i=0; i<audioFrame->dataSize; i++) { short *pShort = pByte + i * 4; //Left channel value short leftChannelValue = *pShort; pShort = pByte + i * 4 + 2; //Right channel value short rightChannelValue = *pShort; }
In addition, it should be noted that the storage mode of data is divided into large end sequence and small end sequence. Small end sequence refers to that low address stores low order and high address stores high order. Large end sequence is opposite to small end sequence, that is, low address stores high order. Attention should be paid to separating channel data.
//Audio data stored in large end sequence uint8_t* pByte = audioFrame->data; for(int i=0; i<audioFrame->dataSize; i++) { short *pShort = pByte + i * 4; //Left channel value short leftChannelValue = ((*pShort & 0xFF00) >> 8) | ((*pShort & 0x00FF) << 8); pShort = pByte + i * 4 + 2; //Right channel value short rightChannelValue = ((*pShort & 0xFF00) >> 8) | ((*pShort & 0x00FF) << 8); }
OpenGL ES rendering audio data
OpenGLES, the full name of OpenGL for Embedded Systems, is a subset of 3D graphics application program interface OpenGL. In essence, it is a cross programming language and cross platform programming interface specification, which is mainly used in embedded devices, such as mobile phones, tablets and so on.
Since the relevant knowledge points of OpenGL ES have been systematically described in the early stage, it will not be described here. For details, please refer to:
Android OpenGL ES from getting started to mastering systematic learning tutorial
Rendering audio data with OpenGL is essentially to build a set of grids as shown in the figure below according to the value of audio data, and finally render them into a bar graph.
The next step is the code implementation process. First, create the Render of GLSurfaceView in the Java layer, and add the corresponding Native function in FFMediaPlayer:
private GLSurfaceView.Renderer mAudioGLRender = new GLSurfaceView.Renderer() { @Override public void onSurfaceCreated(GL10 gl10, EGLConfig eglConfig) { FFMediaPlayer.native_OnAudioVisualSurfaceCreated(); } @Override public void onSurfaceChanged(GL10 gl10, int w, int h) { FFMediaPlayer.native_OnAudioVisualSurfaceChanged(w, h); } @Override public void onDrawFrame(GL10 gl10) { FFMediaPlayer.native_OnAudioVisualDrawFrame(); } }; public class FFMediaPlayer { static { System.loadLibrary("learn-ffmpeg"); } //...... //for audio visual render public static native void native_OnAudioVisualSurfaceCreated(); public static native void native_OnAudioVisualSurfaceChanged(int width, int height); public static native void native_OnAudioVisualDrawFrame(); }
JNI corresponding to Java layer interface:
//Visual audio rendering interface JNIEXPORT void JNICALL Java_com_byteflow_learnffmpeg_media_FFMediaPlayer_native_1OnAudioVisualSurfaceCreated(JNIEnv *env, jclass clazz) { AudioVisualRender::GetInstance()->OnAudioVisualSurfaceCreated(); } JNIEXPORT void JNICALL Java_com_byteflow_learnffmpeg_media_FFMediaPlayer_native_1OnAudioVisualSurfaceChanged(JNIEnv *env, jclass clazz, jint width, jint height) { AudioVisualRender::GetInstance()->OnAudioVisualSurfaceChanged(width, height); } JNIEXPORT void JNICALL Java_com_byteflow_learnffmpeg_media_FFMediaPlayer_native_1OnAudioVisualDrawFrame(JNIEnv *env, jclass clazz) { AudioVisualRender::GetInstance()->OnAudioVisualDrawFrame(); }
Class of audio rendering in Native layer:
#include <LogUtil.h> #include <GLUtils.h> #include "AudioVisualRender.h" #include <gtc/matrix_transform.hpp> #include <detail/type_mat.hpp> #include <detail/type_mat4x4.hpp> #include <render/video/OpenGLRender.h> AudioVisualRender* AudioVisualRender::m_pInstance = nullptr; std::mutex AudioVisualRender::m_Mutex; AudioVisualRender *AudioVisualRender::GetInstance() { if(m_pInstance == nullptr) { std::unique_lock<std::mutex> lock(m_Mutex); if(m_pInstance == nullptr) { m_pInstance = new AudioVisualRender(); } } return m_pInstance; } void AudioVisualRender::ReleaseInstance() { std::unique_lock<std::mutex> lock(m_Mutex); if(m_pInstance != nullptr) { delete m_pInstance; m_pInstance = nullptr; } } void AudioVisualRender::OnAudioVisualSurfaceCreated() { ByteFlowPrintE("AudioVisualRender::OnAudioVisualSurfaceCreated"); if (m_ProgramObj) return; char vShaderStr[] = "#version 300 es\n" "layout(location = 0) in vec4 a_position;\n" "layout(location = 1) in vec2 a_texCoord;\n" "uniform mat4 u_MVPMatrix;\n" "out vec2 v_texCoord;\n" "void main()\n" "{\n" " gl_Position = u_MVPMatrix * a_position;\n" " v_texCoord = a_texCoord;\n" " gl_PointSize = 4.0f;\n" "}"; char fShaderStr[] = "#version 300 es \n" "precision mediump float; \n" "in vec2 v_texCoord; \n" "layout(location = 0) out vec4 outColor; \n" "uniform float drawType; \n" "void main() \n" "{ \n" " if(drawType == 1.0) \n" " { \n" " outColor = vec4(1.5 - v_texCoord.y, 0.3, 0.3, 1.0); \n" " } \n" " else if(drawType == 2.0) \n" " { \n" " outColor = vec4(1.0, 1.0, 1.0, 1.0); \n" " } \n" " else if(drawType == 3.0) \n" " { \n" " outColor = vec4(0.3, 0.3, 0.3, 1.0); \n" " } \n" "} \n"; //Generate shader program m_ProgramObj = GLUtils::CreateProgram(vShaderStr, fShaderStr); if (m_ProgramObj == GL_NONE) { LOGCATE("VisualizeAudioSample::Init create program fail"); } //Set MVP Matrix transformation matrix // Projection matrix glm::mat4 Projection = glm::ortho(-1.0f, 1.0f, -1.0f, 1.0f, 0.1f, 100.0f); //glm::mat4 Projection = glm::frustum(-ratio, ratio, -1.0f, 1.0f, 4.0f, 100.0f); //glm::mat4 Projection = glm::perspective(45.0f, ratio, 0.1f, 100.f); // View matrix glm::mat4 View = glm::lookAt( glm::vec3(0, 0, 4), // Camera is at (0,0,1), in World Space glm::vec3(0, 0, 0), // and looks at the origin glm::vec3(0, 1, 0) // Head is up (set to 0,-1,0 to look upside-down) ); // Model matrix glm::mat4 Model = glm::mat4(1.0f); Model = glm::scale(Model, glm::vec3(1.0f, 1.0f, 1.0f)); Model = glm::rotate(Model, 0.0f, glm::vec3(1.0f, 0.0f, 0.0f)); Model = glm::rotate(Model, 0.0f, glm::vec3(0.0f, 1.0f, 0.0f)); Model = glm::translate(Model, glm::vec3(0.0f, 0.0f, 0.0f)); m_MVPMatrix = Projection * View * Model; } void AudioVisualRender::OnAudioVisualSurfaceChanged(int w, int h) { ByteFlowPrintE("AudioVisualRender::OnAudioVisualSurfaceChanged [w, h] = [%d, %d]", w, h); glClearColor(1.0f, 1.0f, 1.0f, 1.0); glViewport(0, 0, w, h); } void AudioVisualRender::OnAudioVisualDrawFrame() { ByteFlowPrintD("AudioVisualRender::OnAudioVisualDrawFrame"); glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); std::unique_lock<std::mutex> lock(m_Mutex); if (m_ProgramObj == GL_NONE || m_pAudioBuffer == nullptr) return; UpdateMesh(); lock.unlock(); // Generate VBO Ids and load the VBOs with data if(m_VboIds[0] == 0) { glGenBuffers(2, m_VboIds); glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]); glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * m_RenderDataSize * 6 * 3, m_pVerticesCoords, GL_DYNAMIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]); glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * m_RenderDataSize * 6 * 2, m_pTextureCoords, GL_DYNAMIC_DRAW); } else { glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]); glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(GLfloat) * m_RenderDataSize * 6 * 3, m_pVerticesCoords); glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]); glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(GLfloat) * m_RenderDataSize * 6 * 2, m_pTextureCoords); } if(m_VaoId == GL_NONE) { glGenVertexArrays(1, &m_VaoId); glBindVertexArray(m_VaoId); glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]); glEnableVertexAttribArray(0); glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), (const void *) 0); glBindBuffer(GL_ARRAY_BUFFER, GL_NONE); glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]); glEnableVertexAttribArray(1); glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (const void *) 0); glBindBuffer(GL_ARRAY_BUFFER, GL_NONE); glBindVertexArray(GL_NONE); } // Use the program object glUseProgram(m_ProgramObj); glBindVertexArray(m_VaoId); GLUtils::setMat4(m_ProgramObj, "u_MVPMatrix", m_MVPMatrix); GLUtils::setFloat(m_ProgramObj, "drawType", 1.0f); glDrawArrays(GL_TRIANGLES, 0, m_RenderDataSize * 6); GLUtils::setFloat(m_ProgramObj, "drawType", 2.0f); glDrawArrays(GL_LINES, 0, m_RenderDataSize * 6); } void AudioVisualRender::UpdateAudioFrame(AudioFrame *audioFrame) { if(audioFrame != nullptr) { ByteFlowPrintD("AudioVisualRender::UpdateAudioFrame audioFrame->dataSize=%d", audioFrame->dataSize); std::unique_lock<std::mutex> lock(m_Mutex); if(m_pAudioBuffer != nullptr && m_pAudioBuffer->dataSize != audioFrame->dataSize) { delete m_pAudioBuffer; m_pAudioBuffer = nullptr; delete [] m_pTextureCoords; m_pTextureCoords = nullptr; delete [] m_pVerticesCoords; m_pVerticesCoords = nullptr; } if(m_pAudioBuffer == nullptr) { m_pAudioBuffer = new AudioFrame(audioFrame->data, audioFrame->dataSize); m_RenderDataSize = m_pAudioBuffer->dataSize / RESAMPLE_LEVEL; m_pVerticesCoords = new vec3[m_RenderDataSize * 6]; //(x,y,z) * 6 points m_pTextureCoords = new vec2[m_RenderDataSize * 6]; //(x,y) * 6 points } else { memcpy(m_pAudioBuffer->data, audioFrame->data, audioFrame->dataSize); } lock.unlock(); } } //Create and update the grid of the bar graph, where one frame of audio data is too large and sampled void AudioVisualRender::UpdateMesh() { float dy = 0.25f / MAX_AUDIO_LEVEL; float dx = 1.0f / m_RenderDataSize; for (int i = 0; i < m_RenderDataSize; ++i) { int index = i * RESAMPLE_LEVEL; //RESAMPLE_LEVEL indicates the sampling interval short *pValue = (short *)(m_pAudioBuffer->data + index); float y = *pValue * dy; y = y < 0 ? y : -y; vec2 p1(i * dx, 0 + 1.0f); vec2 p2(i * dx, y + 1.0f); vec2 p3((i + 1) * dx, y + 1.0f); vec2 p4((i + 1) * dx, 0 + 1.0f); m_pTextureCoords[i * 6 + 0] = p1; m_pTextureCoords[i * 6 + 1] = p2; m_pTextureCoords[i * 6 + 2] = p4; m_pTextureCoords[i * 6 + 3] = p4; m_pTextureCoords[i * 6 + 4] = p2; m_pTextureCoords[i * 6 + 5] = p3; m_pVerticesCoords[i * 6 + 0] = GLUtils::texCoordToVertexCoord(p1); m_pVerticesCoords[i * 6 + 1] = GLUtils::texCoordToVertexCoord(p2); m_pVerticesCoords[i * 6 + 2] = GLUtils::texCoordToVertexCoord(p4); m_pVerticesCoords[i * 6 + 3] = GLUtils::texCoordToVertexCoord(p4); m_pVerticesCoords[i * 6 + 4] = GLUtils::texCoordToVertexCoord(p2); m_pVerticesCoords[i * 6 + 5] = GLUtils::texCoordToVertexCoord(p3); } } void AudioVisualRender::Init() { m_VaoId = GL_NONE; m_pTextureCoords = nullptr; m_pVerticesCoords = nullptr; memset(m_VboIds, 0, sizeof(GLuint) * 2); m_pAudioBuffer = nullptr; } //Free memory void AudioVisualRender::UnInit() { if (m_pAudioBuffer != nullptr) { delete m_pAudioBuffer; m_pAudioBuffer = nullptr; } if (m_pTextureCoords != nullptr) { delete [] m_pTextureCoords; m_pTextureCoords = nullptr; } if (m_pVerticesCoords != nullptr) { delete [] m_pVerticesCoords; m_pVerticesCoords = nullptr; } }
Finally, you only need to call the following functions in the callback function of OpenSLES player (see the previous article):
AudioFrame *audioFrame = m_AudioFrameQueue.front(); if (nullptr != audioFrame && m_AudioPlayerPlay) { SLresult result = (*m_BufferQueue)->Enqueue(m_BufferQueue, audioFrame->data, (SLuint32) audioFrame->dataSize); if (result == SL_RESULT_SUCCESS) { //Finally, you only need to call the updateaudiofframe function in the callback function of OpenSLES player AudioVisualRender::GetInstance()->UpdateAudioFrame(audioFrame); m_AudioFrameQueue.pop(); delete audioFrame; } }
Contact and communication
If you have questions or technical exchanges, you can add my wechat: byte flow