Research on Real Time Video Compression Techniques and Algorithims Based on H.264
|School||Harbin Institute of Technology|
|Course||Information and Communication Engineering|
|Keywords||Fast intra prediction Parallel processing Signaling modes Coefficient token|
High speed connections to the home are commonplace and the storage power of flash memories, hard disks and optical media is more than ever before. The cost per transmitted or stored bit is falling continuously, then why video compression is needed and why there is such a momentous effort to make it better? Video compression has two important advantages. First, it makes it feasible to use digital video in transmission and storage environments that would not support uncompressed video. Second, video compression allows more professional use of storage and transmission resources.Image and video compression has been a very active topic of research and development for over 20 years. Different algorithms and systems for compression and decompression have been projected and developed. In order to promote competition and increased choice, it has been essential to define standard methods of compression for encoding and decoding to allow products from different companies to communicate efficiently. This has initiated to the development of a number of International Standards for image and video compression, together with the JPEG, MPEG and H.26×series of standards.Video compression algorithms work by removing redundancies in the temporal, spatial and frequency domains. By removing different types of redundancies it is possible to compress the data considerably at the cost of a certain amount of information loss. More compression can be attained by encoding the processed data using an entropy coding technique like Huffman coding or Arithmetic coding. H.264 has appreciably enhanced the coding performance in both low and high bit rates as compared with earlier coding standards ( H.263, MPEG-2 and MPEG-4).H.264/MPEG4 part 10 uses the rate distortion optimization (RDO) technique to obtain the best result in terms of visual and coding performance. In order to perform RDO, the encoder encodes video by fully searching the best mode in the RD sense among different predefined modes. Consequently, the computational complexity of the encoder is dramatically increased, which makes it hard for practical applications such as real time video communication. This dissertation addresses how to reduce the computational complexity associated with H.264/MPEG4 part 10. The major achievements in this dissertation are summarized below.We proposed fast intra prediction mode decision using parallel processing. In the scenarios of real-time multimedia, computational complexity becomes a key constraint. Several attempts have been made to explore fast algorithms for intra prediction mode decision. Most existing“fast”intra prediction algorithms reduce computation by decreasing the number of candidates. A reduction in computational complexity of video encoders affects decoded video quality. The full search algorithm in H.264 computes and compares all modes, so it is sure to choose the best mode. We used parallel processing to solve this problem because parallelism cuts down significantly on the time it takes to reach a solution to a research problem, and increase the size of the problem scientists can tackle. We selected FPGA (Field Programmable Gate Array) platform for parallel processing. Hardware circuits such as the FPGA run in parallel, because each sub circuit executes their function independently. We implemented nine intra 4×4 luma modes on FPGA using both approaches i.e serial and parallel processing. Experimental results show that the time to find the best intra prediction mode by parallel processing is much less than the time by serial processing with no performance degradation.We proposed efficient techniques for signaling intra prediction modes number. The choice of intra prediction mode for each 4×4 block must be signaled to the decoder and this could potentially require a large number of bits. However, intra modes for neighbouring 4×4 blocks are often correlated. To take advantage of this correlation, predictive coding is used to signal 4×4 intra modes. At the boundaries of frame, we can’t apply all modes because of the limitation of available pixels for prediction. Now question arises, is it feasible to use same technique to signal less number of modes as for nine modes? We proposed different techniques than the given technique for signaling 4×4 intra prediction modes. The proposed technique for signaling three modes (1, 2 and 8) of upper frame/slice boundary is as following. The encoder sends a flag for each 4×4 block, previous intra 4×4 prediction mode, if the flag is‘1’, the most probable prediction mode is used. If the flag is‘0’, another one bit is sent to indicate remaining two modes. We proposed three different techniques for signaling four modes (0, 2, 3 and 7) of left frame/slice boundary, the best technique is as following. The encoder sends a flag for each 4×4 block, previous intra 4×4 prediction mode, if the flag is‘1’, the most probable prediction mode is used. If flag is‘0’, another flag is sent to indicate the next most probable mode, if this flag also‘0’, another one bit is sent to indicate remaining two modes. Experimental results show that the proposed techniques outperform the existing technique.We proposed another technique for fast intra prediction mode decision by selecting fewer number of modes. As we mentioned in the previous paragraph, it is not practical to apply all 4×4 luma intra prediction modes at the frame/slice boundaries, bits can be saved for signaling fewer intra prediction modes. Only three 4×4 intra prediction modes (1, 2 and 8) can be applied at the upper frame/slice boundary, four intra prediction modes (0, 2, 3 and 7) can be applied at the left frame/slice boundary, seven 4×4 intra prediction modes (0, 1, 2, 4, 5, 6 and 8) can be applied at the right frame/slice boundary and nine intra prediction modes can be applied at the rest of 4×4 blocks. On the right frame slice boundary we selected only five modes instead of seven modes and computed RD performance of different combinations of five modes. Similarly instead of using nine modes we selected only five modes and computed RD performance of different combinations of five modes. After analyzing experimental results we have come to know that the combination of five modes (0, 1, 2, 4 and 8) on right boundary gives best results. And the combination of five modes (0, 1, 3, 4 and 8) for rest of 4×4 blocks gives best RD performance. The proposed technique for signaling five intra prediction modes is as following. The encoder sends a flag for each 4×4 block, previous intra 4×4 prediction mode. If the flag is‘1’, the most probable prediction mode is used. If the flag is‘0’, another parameter (2 bits) remaining intra 4×4 prediction modes is sent to signal remaining four modes. Experimental results show that the increase in the number of bits for encoding residual coefficients is approximately same as decrease in the number of bits required to signal intra prediction modes, providing almost same PSNR (peak signal to noise ratio). By using proposed techniques, computational speed for finding best 4×4 intra prediction mode is increased by about 45% without significant performance degradation.We also studied the effect of adaptive probability updating of look-up table values for encoding Coefficient Token. In the new H.264/AVC standard, when entropy coding mode is set to zero, residual block data is coded using a context adaptive variable length coding (CAVLC) scheme. The first VLC, coefficient token, encodes both the total number of nonzero coefficients and the number of trailing ones. There are four choices of look-up table to use for encoding coefficient token for a 4×4 block. We studied the results of assigning shorter codes to more probable pairs (Total coefficients, T1s) adaptively and vice versa. There are big gaps between the probability lines of three pairs((0,0),(1,1) and (2,2)), so adaptive probability updating can’t give better results. The probability lines of other pairs intersect each other and adaptive probability updating gives better results, but the probability (≈10 %) of these pairs is very small.