Font Size: a A A

Segmentation Of Head-and-Shoulder Video Object For MPEG-4

Posted on:2006-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:D D TangFull Text:PDF
GTID:2168360155953444Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The most important characteristic of MPEG-4 is describing content and coding using audio and video object, which requires video images to be segmented into video objects. The content-based coding technique can encode video object sequence with arbitrary shape. The bit stream of video objects coded exists with object layer form. The different video objects at the same scene can be encoded and transmitted separately. Therefore video objects can be decoded and rebuilded at sink, and be operated to change original scene. The problem is complex, and there aren't successful methods by far. MPEG-4 standard offers a general framework for object-based system but leaves the segmentation problem as an open issue. As the basic start of MPEG-4, video segmentation algorithm is important to final products. This is the cause why the paper selects the task. In the literature, large numbers of segmentation algorithms, both fully automatic and semiautomatic, have been published. Among them, activities on automatic segmentation have mainly focused on motion detection (be fit for static background), motion segmentation, joint motion and (chromatic) region segmentation, and scheme based on higher order statistics of displaced frame differences(DFD) and mathematical morphological operators etc.. In general, automatic segmentation algorithms are complex and need much prior knowledge, moreover they are only appropriate for specific applications. In contrast, typical semi-automatic algorithms utilize certain user interactions to specify object on one or several anchoring frames, and tracking algorithms are then designed to track the object in the temporal direction. Meanwhile, human can control the segmentation, whenever they can stop the tracking process to correct the error which tracking algorithms make. A statistical model-based video segmentation algorithm is presented for head-and-shoulder type video. The motivation stems from popular presence of head-and-shoulder type video signal in real-time services such as videophone and web chatting, etc., which form the application domain of our work. By focusing to this specific application domain, the segmentation algorithm could be designed to exploit domain knowledge and realize real-time performance with less computation complexity. The algorithm is based on blob model which is presented by Pfinder. It combines color, spatial location and temporal feature. The main part is as follow: each pixel is converted from RGB color space into YUV space; background statistical model is created; then blob model of foreground is created; Kalman filters is used for blob tracking, then the object segmentation problem is converted into a model detection and tracking problem. At the system level, a hierarchical structure is designed to reduce complexity. The algorithm aims to segments the input video into three video objects (VO)s: a background, a head/face and a shoulder. The details are discussed as follow, respectively. The foreground object is modeled with blob model. The blob representation that we use was developed by Pentland and Kauth et al., as a way of extracting an extremely compact, structurally meaningful description of multispectral satellite (MSS) imagery. In this method, feature vectors at each pixel are formed by adding ( x ,y)spatial coordinates to the spectral or textural components of the imagery. These are then clustered so that image properties such as color and spatial similarity combine to form coherent connected regions, or blobs, in which all the pixels have similar image properties. This blob description method is, in fact, a special case of recent Minimum Description Length (MDL) algorithms. The pixel classification is to resolve the class membership likelihoods at each pixel into support maps, indicating for each pixel whether it is part of one of the blobs or of the scene. Spatial priors and connectivity constraints are used to accomplish this resolution. Class membership likelihoods at each pixel are computed based on Bayes principle and prior probability, and the biggest one is selected by maximum a posteriori probability (MAP) principle. The pixels are classified into different blob regions or background. Many pixels may be misclassified because of noise, so there are small holes inside the blobs. The support map is processed with morphological filters to convert the classified pixels into meaningful regions. Fourier Transform or Wavelet Transform methods can be used to low image resolution, but computation is complicated. Scale flex uses interpolation algorithm,and it can reduce image resolution, but it involves expensive computation. In contrast, image subsample is rapid, and computation is easy. In the hierarchical design, subsample method is used to low image resolution. An input image is first subsampled by in both horizontal and vertical directions. Model analysis (region based blob model) and tracking are carried out in the obtained lower resolution image. The processing result is then up-sampled and further refined in the original resolution to produce the final output. Software emulator is made for segmentation algorithm. Experiments show that the algorithm can realize its function. A QCIF size head-and-shoulder video can be segmented into background, head and shoulder three regions. It will be useful to combine head and shoulder as a foreground object. This technique will be useful in applications, where no QoS is guaranteed, and friendly bandwidth adaptation is critical for a large scale fair sharing of available bandwidth resources. A full resolution QCIF size video can be sent when the network bandwidth is sufficient, and the video is automatically downscaled to only the head object to fit to a narrow bandwidth. Though the transmitted video is smaller, the motion of face region is well with audio, which maintains a good user feeling. The hierarchical processing structure suits head-and-shoulder type video. On the one hand, it can adjust source/ channel coding rate, realize error protection, and suit variable and noisy feature of mobile wireless channel or other narrow bandwidth. On the other hand, it can distribute limited coding rate to interested regions, and improve subjective quality of image. The coding method not only is appropriate for video services such as videophone, video message, television news, television conference, and video monitoring etc., but also apply to interactive man and machine communication, visual reality, and gating security etc.. Although the segmentation algorithm can segment head-and-shoulder video into background, head and shoulder three regions, there are some limits which can be improved in the future. For example, when background model is created, N frame image which contain no video object are needed. And in the segmentation process, the camera is motionless. Now the algorithm is appropriate for video which contain only one foreground object, and it is improved to segment two or...
Keywords/Search Tags:MPEG-4, video object segmentation, blob model
PDF Full Text Request
Related items