Font Size: a A A

Airport Video Object Segmentation Based On Deep Learning

Posted on:2020-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2392330596976186Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Airport visual perception is the current research hotspot.The intelligent application of airport based on visual perception is mostly based on the segmentation of moving targets.However,the airport scene environment has special complexity,which makes the performance of traditional detection algorithms greatly reduced,and the detection results are greatly affected by weather changes and often have fracture defects.Compared with traditional methods,the deep learning method can learn more representative feature for different tasks and different data sets,and get more accurate recognition results.Therefore,it is very important and meaningful to study the deep learning method to establish a moving target segmentation model for the special environment of the airport.The main work of this paper is as follows:1.Since the airport scene is very wide,the scale of the target changes greatly,and there are problems such as occlusion and similar stationary target interference,it is difficult to solve the above problem by one image segmentation network only based on separate spatial information.Considering the motion information and the appearance information,a spatio-temporal two-stream network structure is constructed.The appearance model is built based on the full convolutional network,and the ideas of dilated convolution and multi-scale feature fusion are added.motion model uses PWC-Net to estimate the optical flow in real time,and through the pyramid pooling module to complete the mapping of the optical flow to the segmentation result.Finally,combining the advantages and disadvantages of the appearance model and the motion model,it is proposed to use the cosine similarity to define the optical flow error,and use this as the fusion confidence to fuse the temporal stream output and the spatial stream output.Experiments were carried out on CDnet 2014 public dataset and airport dataset,which verify the spatio-temporal two-stream networks can achieve good results in various complex scenarios,showing good accuracy and robustness.2.Video motion information and appearance information have certain relevance and consistency,so we introduce multi-task learning idea,and combine optical flow estimation with object segmentation to joint optimize.In the feature extraction stage,the weight of the optical flow branch and the image segmentation branch is shared,which can the avoid repeated feature extraction of the picture and can the greatlyreduced the calculation amount.On the other hand,the communication between the two branches is constructed in the upsampling stage to make the optical flow Estimation and object segmentation can comprehensively utilize motion information and appearance information.3.Unsupervised optical flow estimation is used because the optical flow label of the actual data of the airport is difficult to obtain.Based on the assumption of constant brightness and local smoothing of the optical flow,the difference between the target image and the warped image obtained by interpolation of the optical flow is used as a supervised signal,and the local smoothing of the optical flow is used as a constraint.The final unsupervised optical flow loss function is as described above.For the problem of category imbalance in video object segmentation,use Focal loss to guide segmentation training,which can solve the category imbalance problem and strengthen the network’s attention to difficult samples and improve the training effect.
Keywords/Search Tags:deep learning, fully convolutional network, moving object segmentation, optical flow estimation, spatio-temporal two-stream network
PDF Full Text Request
Related items