Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Pattern Recognition and devices > Image recognition device

2D to3D Conversion Method for3DTV Application

Author LiLe
Tutor ZhangMaoJun
School National University of Defense Science and Technology
Course Control Science and Engineering
Keywords Content Understanding Visual Perception ObjectSegmentation H.264 Depth Estimation
CLC TP391.41
Type PhD thesis
Year 2012
Downloads 122
Quotes 0
Download Dissertation

With3DTV come into market and3D Video channel launched, lacking of3DTVvideo content becomes a key problem which will restrict the development of3DTVtechnology.2D-3D conversion method is an efficient way to solve that problem. Since itconsiders some special requirement for3DTV application and the character of some2Dvideo program, some2D-3D conversion methods are researched in this paper for thepurpose that not only the depth of the scene in each frame is estimated accurately butalso the3D video which is converted by our method will give spectators a wonderfulexperience. In this paper, the researches on2D-3D conversion methods have beencarried out by content understanding, visual perception and video compressed domainas follows.(1)Because the depthmap of3D video which is converted by some auto2D-3Dconvertion methods is not acurate enough to show a better visual perception to audience,a new2D-3D convertion methods based on content understanding is proposeed. In thispaper, the video of street scene is used as test video. And the depth map is estimated byanalyzing how the image is made up of every object in this method. At first, it findsuniform region in each frame and extract the features from each region belonged toitself and its neighbors. Thus, each region is labeled into categories of object by SVM,and it shows how the frame is made up of every object. Then, the relationship betweenthe depth and the coordinates of ground area in frame is deduced by the pin-hole opticalimaging model. And the depth of each object is estimated by where it stands. Becausethe depth of different building facets are not uniform, the building is divided into facetsin order to obtain the accurate depth of building area. The feature line segmentsextracted from the area of buildings are accumulated with the mathematics’ modelwhich deduced by the relation of horizontal lines between the real world and image.And each building facet could be recognized by dynamic programming with the resultof feature line accumulation. So the depth in each facet of building is easy to estimateby photogrammetry. The experiment shows that the results of depth estimation by ourmethod are more similar to groundtruth depth map than the others, and it can reflect thelocation of each object in the real world exactly. The2D video of street scene in TVprogram can be converted to3D video by that method.(2)According to human visual perceiving, people tend to pay more attention tothe foreground rather than background when they watch videos. A new2D-3Dconversion method based on human visual perceiving which is fit for converting2Dvideo program just as interview and lecture which has the static background to3D videois proposed. The depth of foreground and background by different method respectivelyin this method which can improve the efficiency of2D-3D conversion obviously. At first, the moving objects are detected by Gaussian background model and refined byattention model for the purpose to extract the notable foreground in videos. After eachframe is divided into foreground and background, the depth of them are estimatedrespectively. The depth of each object in background in key frame is estimated byanalyzing how the background is made up of each object. And the depth of backgroundarea in the other frames can be estimated from the depth of key frames. In order to makethe depth of notable foreground accurate and smooth, the depth of foreground isestimated by the location it stands and the relationship of the same object in consecutiveframes simultaneously. The depth of foreground would be estimated accurately in orderto make the3D video give good visual perception, and the depth estimation ofbackground would be simplified at the same time which can improve the efficiency ofour2D-3D conversion method obviously.(3)the method which can convert2D compressed video to3D video directly ismore useful in3DTV terminal because most of videos are saved as the compressedformat. A new2D-3D conversion method in H.264compressed domain is proposed.Firstly, the MVs of macroblocks are accumulated and filtered. The Jacobian matrix isused to describe the relationship between parameters of global motion and MV in eachmacroblock, which makes the computation of global motion parameters in compresseddomain easier than before. Then the motion areas are detected by the similarity of localmotion and global motion. At one time, the remarkable blocks of DCT energy areaccumulated in temporal by the reliable motion around them, which makes the featureof DCT more notable. Then the border and texture area are found by the accumulationsof DCT energy with the local self-adaptive threshold which selected by entropy incompressed domain. So the motion foreground in video can be obtained by combiningthe results detected by MV and DCT respectively. Secondly, the foreground is refinedby a new temporal snake model which is built with temporal-spatial information invideo in order to obtain a more stable boundary of objects for3DTV applications. Atlast, the depth map is estimated with the motion parallex and how the scene is made upof each object. The experiment shows that our method converts2D to3D video with theinformation in video compressed domain. Both the efficiency and the result of ourmethod have been improved better than before. The method can be used for somereal-time application such as2D-3D embedded system in3DTV.

Related Dissertations
More Dissertations