Tensor Decomposition and Its Applications in Dynamic Textures
|Keywords||tensor decomposition dynamic texture compact representation dimen-sionality reduction video coding crowd density estimation|
Dynamic textures are powerful visual cues in image sequences. They are a family ofvisual phenomena such as fre and fags, which are spatially repetitive, temporally varyingvisual patterns in image sequences with certain stationary properties. As an importantsource of natural video signals, dynamic textures usually generate an enormous size ofhigh-dimensional data, therefore efective representation models are needed for relevantapplications. Due to the complex motion characteristics, the study of dynamic texturesposes numerous challenges for traditional methods.Referring to signal and image processing, we think of tensors as a generalization ofscalars, vectors and matrices to higher-order structures. They are natural representationsof high-dimensional data preserving their native form, thus can describe the complexcharacteristics of the data. Tensor decompositions, which are higher-order generalizationsof matrix singular value decomposition, are promising methods to process and analyzehigh-dimensional data. In the recent ten years, interest in tensor methods have beenexpanded to signal and image processing and other areas, which initiate new ideas andmethods for high-dimensional data processing and analysis.This dissertation mainly study tensor decomposition and its applications in dynamictextures. The main contributions of the dissertation are as follows.1. We propose a tensor based dynamic texture model preserving the native form of thedata, and present an algorithm for estimating the model parameters. Comparedwith the linear dynamic system model, which is a state-of-the-art model for dynamictextures, we have enough freedom to characterize the nature of dynamic texturesfrom diferent modes, such as spatial, chromatic and temporal. Therefore, ourmodel can efectively explore the native features of dynamic textures.2. We apply our tensor based dynamic texture model to the application of dynamictexture synthesis, and make a lot of experiments and analysis. Experimental resultsshow that compared with the linear dynamic texture model, our method can achievea much better synthesis video quality with a lower size of model. The average PSNRgain ranges from approximately2dB up to7dB.3. We propose the notion of compact representation for high-dimensional data, whichis a step advance of sparse representation. In order to preserve the inherent struc- ture, we also present a multiple tensor low-rank approximation algorithm to obtaina compact representation. This algorithm provides a fexible low-rank approxima-tion and strikes a balance between computational complexity and approximationaccuracy.4. We apply our tensor compact representation method to dynamic texture compactrepresentation and coding. Since we take advantage of more rational structure ofthe data, the improvement of coding performance is signifcant. Compared with thestate-of-the-art video coding standards, H.264/AVC, the average PSNR gain rangesfrom approximately0.41dB up to8.76dB, while the bit-rate reduction ranges fromapproximately1.04%up to77.81%. Moreover, the improvement is very signifcantespecially for the regular dynamic textures. We can achieve a high encoded videoquality with a very low bit-rate. The experimental results also indicate that itis hard to obtain a better compact representation by the iterative tensor rank-1approximation.5. The crowd image sequences are a special class of dynamic textures. Focusing onthe application of crowd density estimation, we study the methods based on higher-order tensor analysis. We frst propose an approach for constructing an orthonormalbasis of a tensor principal subspace based on higher-order singular value decompo-sition. Then, we present two tensor principal subspace based methods for crowddensity estimation. Since our methods preserve the nature structure of the crowddata, we can extract efective features for characterizing the crowd density. Ex-perimental results show that our methods are better than the methods based onthe gray level co-occurrence matrix or wavelet transform. The accuracy of ourmethods achieve approximately96.83%, in which the misclassifed images are allconcentrated in their neighboring categories.