HEVC, H.264 and AVS2 video compression

Lab2 Report: HEVC, H.264 and AVS2 Video Compression

1. Abstract

This lab report is the second major assignment in video coding and communication. The title of the assignment is HEVC and H.264 video compression.

This report first selects two test sequences with different characteristics to set two sets of different encoding parameters for HEVC encoding and decoding according to the requirements of the job release. Then perform HEVC and H.264 compression encoding and decoding on the same test sequence, and compare and analyze the difference in performance.

Secondly, this report also records some extensions: perform AVS2 compression encoding and decoding on the same sequence, and compare it with the decoded pictures obtained in experiment 2, and briefly analyze the performance differences of these three compression standards.

2. Experiment 1: HEVC video encoding and decoding

2.1 Experiment content

Reference code HM16.12 or later. Select at least two test sequences, with different spatial resolutions, different
Motion and texture characteristics; for each test sequence, set two different encoding parameters for encoding and decoding; give
Corresponding parameter configuration, the original image of several key frames and the corresponding decoded and reconstructed image are given, and the value of each frame of image is given.
PSNR value.

2.2 Brief description of experimental principle

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part II, is a video compression standard that replaces the H.264/AVC encoding standard.

The video coding layer of HEVC, like H.264/AVC and many other video compression codes, adopts the structure of Hybrid Video Coding (as shown in Figure 2-1), but some new technologies have been added to each part or the original coding tools have been improved. s efficiency. 1 For example, flexible block partitioning structure based on quadtree, intra prediction modes of different angles, adaptive motion vector prediction AMVP, merging technology Merge, variable size discrete cosine transform, etc.

As can be seen from the figure, a typical HEVC coding framework has the following modules:

  • Intra prediction: This module is mainly used to remove the spatial correlation of images. The current pixel block is predicted by the encoded reconstructed block information to remove the spatial redundancy information and improve the compression efficiency of the image;
  • Inter prediction: This module is mainly used to remove the temporal correlation of images. Inter-frame prediction obtains the motion information of each block by using the encoded image as the reference image of the current frame, thereby removing temporal redundancy and improving compression efficiency;
  • Transform Quantization: This module performs lossy compression on the data by transform quantizing the residual data to remove frequency-domain correlations. Transform coding transforms the image from the time domain signal to the frequency domain, and concentrates the energy to the low frequency region;
  • De-blocking filtering: In block-based video coding, the reconstructed image will have a block effect. The use of de-blocking filtering can achieve the purpose of weakening or even eliminating the block effect, improving the subjective quality and compression efficiency of the image;
  • The sample adaptive step filter (SAO filter) is after the deblocking filtering. By analyzing the statistical characteristics of the pixels after the deblocking filtering, adding the corresponding offset value to the pixel can weaken the ringing effect to a certain extent and improve the image quality. subjective quality and compression efficiency;
  • Entropy encoding: This module encodes encoded control data, quantized transform coefficients, intra-frame prediction data, and motion data into binary streams for storage or transmission. The output data of the entropy coding module is the compressed code stream of the original video.
2.3 Experimental process and analysis
2.3.1 Experimental environment and test sequence

The code version referenced in this experiment is HW-16.13; the experimental environment is Visual Studio; the tool used to capture frames and play YUV files is YUV Player.

The test sequences used in the experiment are "akiyo_qcif.yuv" and "Bus_cif.yuv" 2 . For the convenience of the experiment, we selected 100 frames for subsequent encoding. The former is a news host, with a small change and fewer moving objects in the picture. The latter is the bus running on the road, the change is larger, and there are more objects.

The video texture is roughly as follows, and the first frame is selected (the two pictures are enlarged and reduced):

The spatial resolution parameters are as follows:

SourceWidth SourceHeight FrameToBeEncoded
akiyo 176 144 100
bus 352 288 100
2.3.2 HEVC encoding and decoding

We put the test sequence file (yuv file) and related configuration files (encoder_intra_main.cfg and the configuration file of the sequence) into the Debug file directory and run the program with Visual Stuido.

(1) Test sequence 1 - the first set of parameters

Test sequence 1 is akiyo_qcif.cfg. We set the QP value to 45 in the quantization section of the configuration file. Some parameters of Coding Structure are set as follows:

#======== Coding Structure =============
IntraPeriod                   : 1 # Period of I-Frame ( -1 = only first)
DecodingRefreshType           : 0 # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize                       : 1 # GOP Size (number of B slice = GOPSize-1)
#Type POC QPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2  temporal_id #ref_pics_active #ref_pics reference pictures 

In this way, we set all frames to I-frames, and the encoding speed can theoretically be improved, but this is based on the sacrifice of the size of the encoded video. The process of video compression is as follows:

The program runs for a total of 82.429s, the file size is 33KB, the source file size is 3713KB, the compression ratio is 112.5, and the compression effect is good. The peak signal-to-noise ratio (PSNR) values ​​are shown below.

SUMMARY --------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    a     129.4520   27.9516   33.5440   36.0687   29.2670


I Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    i     129.4520   27.9516   33.5440   36.0687   29.2670


P Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    p    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)


B Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    b    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)

The PSNR values ​​are all below 40, indicating that the encoded image quality is low. We then decode the generated str.bin file. The decoding process is relatively simple. The decoded file size is 3713KB, which is equal to the original file size. The specific process of decoding is as follows:

Let's compare the key frame image quality of the source file and the decoded file. Because each frame is an I frame, we choose the 1st, 50th, and 100th frames for comparison.

Judging from the comparison results of the above three frames, the quality loss of the picture is relatively large in general, and the contrast in the details is particularly obvious. But the outline of the object can still be distinguished.

(2) Test sequence 1 - the second set of parameters

In this set of parameters, we set the first frame to be an I frame, and the rest of the frames are set to a P frame. That is, the low-latency coding structure. In this way, the encoding speed will theoretically be slower, but faster than the rest of the frames are B frames. Image quality should also be somewhere in between. We choose a QP value of 10, so that

The Coding Structure section in the configuration file (encoder_lowdelay_P_main.cfg) looks like this:

#======== Coding Structure =============
IntraPeriod                   : -1  # Period of I-Frame ( -1 = only first)
DecodingRefreshType           : 0   # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize                       : 4   # GOP Size (number of B slice = GOPSize-1)
IntraQPOffset                 : -1
LambdaFromQpEnable            : 1   # see JCTVC-X0038 for suitable parameters for IntraQPOffset, QPoffset, QPOffsetModelOff, QPOffsetModelScale when enabled

Similarly, the process of video compression is as follows:

The program runs for a total of 200.189s, the file size is 98KB, the source file size is 3713KB, and the compression ratio is 3709. The compression effect is relatively general. The peak signal-to-noise ratio (PSNR) values ​​are shown below.

SUMMARY --------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    a     397.9160   50.9905   52.1198   52.4897   51.3332


I Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                1    i    3481.6000   53.3063   53.3069   53.7012   53.3698


P Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
               99    p     366.7677   50.9671   52.1078   52.4774   51.3168


B Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    b    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)

It can be seen that the overall signal-to-noise ratio is above 50, which is very good for the human eye. The number of I frames and P frames is 1, 99, which is also the same as what we set.

Decode the file, intercept the key frame (the first frame) and compare it as follows:

It can be seen that because the QP value is set relatively high, the image quality is guaranteed, and there is generally no loss. However, because the other frames are all P frames, if you look closely, you will find that the details are blurred, and the faces of the people in the photos can be carefully observed.

(4) Test sequence 2 - the first set of parameters

The first set of coding parameters of test sequence 2 is the same as that of test sequence 1, that is, all frames are set to I frames, and the QP value is set to 45. Other settings remain unchanged. The screenshot of the video compression process is as follows:

The program ran for 242.844s, the file size is 178KB, and the source file size is 14850KB. The compression ratio is 83.4. PSNR values ​​are as follows:

SUMMARY --------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    a     727.1040   24.3546   35.8285   37.1662   25.9815


I Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    i     727.1040   24.3546   35.8285   37.1662   25.9815


P Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    p    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)


B Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    b    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)

It can be seen that the overall PSNR value is below 30, indicating that the picture quality is poor. Decode the file and capture the 1st, 50th, and 100th frames for comparison.

From the comparison results of the above three frames, it is similar to the results of test sequence 1. In general, the image quality loss is relatively large, and the contrast is particularly obvious in the details. But the outline of the object can still be distinguished.

(4) Test sequence 2 - the second set of parameters

The second group of coding parameters of test sequence 2 is the same as that of test sequence 1, that is, the first frame is set to I frame, the rest are P frames, and the QP value is set to 10. Other settings remain unchanged. The screenshot of the video compression process is as follows:

The program ran for 1969.095s, the file size is 2723KB, and the source file size is 14850KB. The compression ratio is 5.45, and the compression effect is poor. PSNR values ​​are as follows:

SUMMARY --------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
              100    a   11151.3200   48.8338   49.3611   50.5012   48.9460


I Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                1    i   23326.0000   53.4858   53.0045   53.5031   53.4046


P Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
               99    p   11028.3434   48.7868   49.3243   50.4709   48.9180


B Slices--------------------------------------------------------
        Total Frames |   Bitrate     Y-PSNR    U-PSNR    V-PSNR    YUV-PSNR
                0    b    -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)  -nan(ind)

It can be seen that the overall signal-to-noise ratio is above 50, which is very good for the human eye. The number of I frames and P frames is 1, 99, which is also the same as what we set.

Decode the file, intercept the key frame (the first frame) and compare it as follows:

It can be seen that because of the texture of the picture and the high QP value, there is almost no difference between the two, but this is based on the sacrifice of compression effect.

3. Experiment 2: H.264 video encoding and decoding

3.1 Experiment content

Select the same test sequence and use the same encoding parameters to perform H.264 and HEVC compression encoding and decoding respectively;
Compare and analyze the encoding performance of two different codecs, give the corresponding parameter configuration, and give the original images of several key frames
image, and the corresponding decoded and reconstructed image, giving the PSNR value of each frame of image.

3.2 Brief description of experimental principle

H.264 is Advanced Video Coding (AVC for short), also known as MPEG-4 Part 10. It is a block-oriented, motion-compensated-based video coding standard. By 2014, it had become one of the most commonly used formats for high-precision video recording, compression, and distribution. 3

H.264/AVC includes a series of new features that make it not only more efficient to encode than previous codecs, but can also be used in applications in various network environments. These new features include but are not limited to the following 4:

  • Motion compensation for multiple reference frames. Compared to previous video coding standards, H.264/AVC uses more frames already encoded as reference frames in a more flexible way. In some cases, up to 32 reference frames can be used.
  • Variable block size motion compensation. A maximum of 16x16 to a minimum of 4x4 blocks can be used for motion estimation and motion compensation, enabling more accurate segmentation of moving regions in an image sequence.
  • The six-tap filter produces a half-pixel luma component prediction. This reduces aliasing and results in a sharper image.
  • Flexible interlaced video encoding. Each frame of an interlaced image consists of two fields, so there are three ways to encode an interlaced image: encode the two fields into one frame, encode the two fields separately, and combine the two fields into one frame, but at the macroblock level Above, a frame macroblock is divided into two field macroblocks for coding.
  • A Loop deblocking filter is used that mitigates the blocking artifacts common to other discrete cosine transform (DCT) based video codecs.

3.3 Experimental process and analysis
3.3.1 Experimental environment and test sequence

The encoder I chose is JM19.0 5 . Other environments are the same as experiment one. In Experiment 2, the test sequence I selected is the Test Sequence 1"akiyo_qcif.yuv" in Experiment 1. The content is news broadcast by the host. The size is 176*144.

3.3.2 Experimental operation and result analysis

The configuration parameters of experiment 2 are the same as those in experiment 1, that is, the first frame is an I frame, and the rest are P frames, so the baseline configuration file can be used. Otherwise, if you need to use the B frame, you need to use the Main method.

The configuration parameters of the JM software are as follows (most of them are consistent with the encoder_baseline.cfg file, some comments have been deleted):

# Files
InputFile             = "akiyo_qcif.yuv"       # Input sequence
InputHeaderLength     = 0      
StartFrame            = 0      # Start frame for encoding. (0-N)
FramesToBeEncoded     = 100      # Number of frames to be coded
FrameRate             = 30.0   # Frame Rate per second (0.1-100.0)
SourceWidth           = 176    # Source frame width
SourceHeight          = 144    # Source frame height
SourceResize          = 0      # Resize source size for output
OutputWidth           = 176    # Output frame width
OutputHeight          = 144    # Output frame height
TraceFile             = "trace_enc.txt"      # Trace file 
ReconFile             = "test_rec.yuv"       # Reconstruction YUV file
OutputFile            = "test.264"           # Bitstream
StatsFile             = "stats.dat"          # Coding statistics file
# Encoder Control
ProfileIDC            = 66  
IntraProfile          = 0                      
LevelIDC              = 40  # Level IDC   (e.g. 20 = level 2.0)
IntraPeriod           = 0   # Period of I-pictures   (0=only first)
IDRPeriod             = 0   # Period of IDR pictures (0=only first)
AdaptiveIntraPeriod   = 1   # Adaptive intra period
AdaptiveIDRPeriod     = 0   # Adaptive IDR period
IntraDelay            = 0  
EnableIDRGOP          = 0   
EnableOpenGOP         = 0  
QPISlice              = 28  # Quant. param for I Slices (0-51)
QPPSlice              = 28  # Quant. param for P Slices (0-51)
FrameSkip             = 0  
ChromaQPOffset        = 0   # Chroma QP offset (-51..51)

After clicking run, the program runs slowly. The running process is as follows:

The program ran for 344.816s. The file size is 13KB, which is 3713KB relative to the source file. The compression ratio is 285.6, which compresses well. The PSNR value information is as follows:

 Y { PSNR (dB), cSNR (dB), MSE }   : {  38.384,  38.380,   9.44246 }
 U { PSNR (dB), cSNR (dB), MSE }   : {  40.826,  40.824,   5.37896 }
 V { PSNR (dB), cSNR (dB), MSE }   : {  41.726,  41.725,   4.37107 }

It can be seen that the PSNR value is close to 40, and the picture quality is relatively good. Decode the file, and compare the key frame of the source file, the key frame decoded by HEVC (the QP value is modified to 28, the file size is 9KB) and the key frame decoded by H.264.

It can be seen from the figure that the picture after HEVC decoding is clearer than the result of H.264 decoding, and the effect of detail recovery is also better. At the same time, HEVC has better compression and shorter runtime. In summary, the performance of HEVC is better than that of H.264.

4. Extended experiment: AVS2 video encoding and decoding

4.1 Experiment content

Select the same test sequence and use the same encoding parameters as the above experiments to perform AVS2 compression encoding and decoding;
Compare and analyze the coding performance of several different codecs, give the corresponding parameter configuration, and give the original images of several key frames
image, and the corresponding decoded reconstructed image, giving the average PSNR value of the image.

4.2 Brief description of experimental principle

The AVS standard is a source coding standard with independent intellectual property rights in my country. It is aimed at the needs of China's audio and video industry. It is led by scientific research institutions and enterprises in the field of digital audio and video in China, and relevant international units and enterprises are widely involved. It is formulated in accordance with international open rules. series of standards. At present, the formulation of two generations of AVS standards has been completed.

The second-generation AVS standard, referred to as AVS2, is the primary application target of ultra-high-definition video, supporting efficient compression of ultra-high-resolution (above 4K) and high dynamic range video. The IEEE international standard number is IEEE1857.4. According to the narration of the official website, in the fields of digital TV broadcasting (interlaced), real-time communication and digital cinema or still images, the coding performance of AVS2 and HEVC is similar, but in the application of digital TV broadcasting (interlaced) and video surveillance, AVS2's coding performance is similar. The encoding performance is significantly higher than HEVC.

Similar to the HEVC coding framework, AVS2 also adopts a hybrid coding framework. The entire coding process includes modules such as intra-frame prediction, inter-frame prediction, transform quantization, inverse quantization and inverse transform, loop filtering, and entropy coding. 6

AVS2 encoding includes but is not limited to the following features:

  • Flexible coding structure division. AVS2 adopts a quadtree-based block division structure, including Coding Unit (CU), Prediction Unit (PU) and Transform Unit (TU).
  • Flexible intra-frame predictive coding. Compared with AVS1 and H.264/AVC, AVS2 designs 33 modes for intra prediction coding of luma blocks, including DC prediction mode, Plane prediction mode, Bilinear prediction mode and 30 angle prediction modes. There are 5 modes on the chroma block: DC mode, horizontal prediction mode, vertical prediction mode, bilinear interpolation mode and the newly added Derived mode (DM) mode.
  • Added forward multi-hypothesis prediction for F-pictures. The coding block can refer to two reference blocks in the forward direction, which is equivalent to the double-hypothesis prediction of the P frame. AVS2 divides double-hypothesis forecasting into two categories, temporal double-hypothesis and spatial double-hypothesis. The current coding block of the temporal double hypothesis uses the weighted average of the predicted blocks as the predicted value of the current block, but there is only one motion vector difference MVD and reference image index, and the other MVD and reference image index are derived according to the distance in the temporal domain according to linear scaling. come out. The airspace double-hypothesis prediction is also called directional multi-hypothesis prediction, which is obtained by fusing two prediction points around the initial prediction point, and the initial prediction point is located on the connecting line of the two prediction points. In addition to the initial prediction point, there are a total of 8 prediction points, and only two prediction points that form the same straight line with the initial prediction point are fused. In addition to the four different directions, it is also adjusted according to the distance. The 4 modes of the 1/2 pixel distance and the 1/4 pixel distance position are calculated separately. After adding the initial prediction point, a total of 9 modes are compared and selected. Best prediction mode.
4.3 Experimental operation and result analysis
4.3.1 Experimental environment and test sequence

Consistent with experiment 1 and 2, the test sequence selected for the extended experiment is also "akiyo_qcif.yuv", and the environment is Visual Studio.

The reference software is AVS2 open sourced by Peking University Encoder and with decoder . Using this encoder and decoder requires some environment configuration.

It is worth noting that most of the environment configuration is figured out by myself, and there may be incorrect or unnecessary operations. Hope the teacher can understand or make corrections.

(1) Encoder environment configuration

First, according to the official documentation, you need to install a shell executor, such as bash in git-for-windows, and add the directory where bash is located to the system environment variable PATH.

Next, you need to download the nasm.exe file and put it in the \build\vs2013 directory.

Then open the 'xavs.sln' project file in the vs2013 directory, and select 'x64' to generate the solution, otherwise it will report a bit error in some asm files. After the build is successful, you will see the generated executable 'xavs.exe' file in the \build\bin\x64_debug folder.

After that, you need to write the configuration file (cfg file) corresponding to the yuv file, and find the 'encoder_ldp.cfg' file in the config folder. Copy and paste it into the working directory, and modify the parameters such as the name of the yuv file, the spatial resolution, and the number of frames to be encoded.

Finally, edit the corresponding command parameters in the properties of xavs in Visual Studio. The command parameters are simple as shown below, the path may need to be modified. After that it runs smoothly.

-f C:\xavs2-master\xavs2-master\build\bin\x64_Debug\encoder_ldp.cfg

(2) Decoder environment configuration

The decoder environment configuration is relatively simple. Executable files can also be generated with 'x64'

Next, copy and paste the test.avs compressed file and test_rec.yuv reference file generated by the encoder into the working directory of the decoder. Modify the command parameters of davs2 in Visual Studio. The command parameters are also very simple. If you need to modify the name of the I/O file, you can modify it directly. Modifying the working directory of davs2 is optional, if not, the decoded file will be generated in the vs2013 folder by default.

-i test.avs -o dec.yuv -r test_rec.yuv 

4.3.2 Experimental process and result analysis

The configuration parameter I chose is that the first frame is an I frame, and the rest are F frames unique to AVS2. QP value modification The default setting is 34 for the first frame, and the minimum and maximum QP values ​​for subsequent frames are both 34. Some configuration parameters after modification are as follows (some comments have been deleted):

InputFile               = "C:\xavs2-master\xavs2-master\build\bin\x64_Debug\akiyo_qcif.yuv"    # Input sequence, YUV 4:2:0
FramesToBeEncoded       = 100            # Number of frames to be coded
SourceWidth             = 176           # Image width  in Pels
SourceHeight            = 144           # Image height in Pels
fps                     = 50.0          
ChromaFormat            = 1             
InputSampleBitDepth     = 8             
SampleBitDepth          = 8           
ReconFile               = "test_rec.yuv"
OutputFile              = "test.avs"
# Maximum Size
MaxSizeInBit            = 6             # Maximum CU size
# Encoder Control
ProfileID               = 32           
LevelID                 = 66           
IntraPeriodMin          = 0            
IntraPeriodMax          = 0            
OpenGOP                 = 0             # Open GOP
UseHadamard             = 1            
FME                     = 3            
SearchRange             = 64            # Max search range
NumberReferenceFrames   = 4            
inter_2PU               = 1            
inter_AMP               = 1            
# F Frames
FFRAMEEnable            = 1             
DHPEnable               = 1             # (0: Don't use DHP,      1:Use DHP) 
MHPSKIPEnable           = 1             
WSMEnable               = 1             # (0: Don't use WSM,      1:Use WSM) 

Click to run, the program runs faster, and the running process is as follows:

The program runs fast, 6.933s encodes 100 frames. The file size is 9KB. The source file size is 3713KB and the compression ratio is 412.56. The compression efficiency and compression effect of the courseware AVS2 are very good.

The PNSR value is as follows, and the PSNR value is close to 40, indicating that the picture quality is acceptable:

AVERAGE SEQ PSNR:      37.5197 40.9658 42.0968
xavs2[i]:          BITRATE:  35.29 kb/s @ 50.0 Hz, 100 frames, xavs2 p8
xavs2[i]:       TOTAL BITS: 70576 (I: 14416, B: 0, P/F: 56160)
xavs2[d]:       TOTAL TIME:    6.912 sec, total 100 frames, speed: 14.47 fps
xavs2[d]:       Frame Time:   I:   1.51%;   B:   0.00%;   P/F:  98.49%
xavs2[i]:       Frame Num :   I:   1.00%;   B:   0.00%;   P/F:  99.00%

The compressed file is decoded and reconstructed, and the images of three standard reconstructions (HEVC and H.264 are the first frame is I frame, the rest are P frames, and the QP value is set to 34) are compared. The results are as follows:

It can be seen that H.264 has the best effect after decoding. The details of ASV2 and HEVC are slightly blurred after decoding, while ASV2 is clearer than HEVC.

Compressed file size compression time Keyframe PSNR value
HEVC 5KB 117.752s 35.8 38.5 40.3
H.264 6KB 356.234s 34.6 38.1 39.7
ASV2 9KB 7.612s 38.7 40.9 42.2

It can be seen from the results that the compression time is quite different, which is quite different from the theory. The reason may be because of the different ways of generating executables, the first two are 'win32' and the latter are 'x64', so the latter is faster to compute.

The compression effect HEVC is the best, followed by H.264, and ASV2 is the last. But the difference is not huge.

The key frame PSNR value ASV2 index is the highest, but H.264 is the clearest in the picture after decoding. It may be determined by the difference between the PSNR index and the human visual system (HVS).

5. Experiment impressions

When I tried to understand the principles of several coding standards in this article, I found it very complicated, and I didn't understand it for a long time. However, after consulting the information, I found that the first generation of AVS standards was established by the predecessor team in my country for more than three years, and I deeply felt the difficulty of the project.

Some of the parameters taken in this paper cannot meet the principle of control variables, but the settings of some parameters make the program run for a long time, so it is difficult to carry out scientific comparison experiments with multiple sets of data. This can be done by yourself after class.

In short, although this experiment is not complicated, it allows us to really feel the process of video coding, and hope that we will have the opportunity to join this field to meet the challenges in the future. At the same time, I would like to thank the assistants and teachers for their questions and explanations.

  1. R. Sjoberg et al., "Overview of HEVC High-Level Syntax and Reference Picture Management," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1858-1870, Dec. 2012, doi: 10.1109/TCSVT.2012.2223052. ↩︎

  2. Download address: http://trace.eas.asu.edu/yuv/index.html ↩︎

  3. https://zh.wikipedia.org/wiki/H.264/MPEG-4_AVC ↩︎

  4. Sullivan G J, Topiwala P N, Luthra A. The H. 264/AVC advanced video coding standard: Overview and introduction to the fidelity range extensions[C]//Applications of Digital Image Processing XXVII. International Society for Optics and Photonics, 2004, 5558: 454-474. ↩︎

  5. Download address: http://iphome.hhi.de/suehring/tml/download/ ↩︎

  6. http://www.avs.org.cn/AVS2/technology.asp ↩︎

Posted by RyanW on Mon, 02 May 2022 22:59:56 +0300