Font Size: a A A

Research On Video Semantic Segmentation With Optical Flow

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HuiFull Text:PDF
GTID:2428330629952689Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantic segmentation has become a research hotspot in deep learning and computer vision,playing an irreplaceable role in autonomous driving,medical image analysis,geographic information systems,and so on.However,since semantic segmentation classifies pictures at pixel level,the amount of data is very huge,and the methods of image semantic segmentation based on deep learning are very slow.Reusing semantic segmentation feature maps through optical flow is an excellent method to speed up the semantic segmentation.This paper focuses on the study of video semantic segmentation based on deep learning,and analyzes and elaborates the theory of deep learning and video semantic segmentation in detail.This paper analyzes and improves DVSNet,a video semantic segmentation network.DVSNet has two main issues.The first is the low overall speed and the latency problem.The slow speed of the image semantic segmentation network seriously slows down the overall speed.And the whole network must wait for the image semantic segmentation network,resulting in the output of key areas is much slower than the overall speed,which is called the latency problem.The latency problem makes DVSNet unsuitable for time-sensitive tasks.The second is that in high-precision tasks with a excepted confidence score threshold greater than 95,the decision network sends a large number of normal image regions into the image semantic segmentation path to process,coupled with the extra computation loss of the image semantic segmentation path,causing the overall speed to drop significantly to 21.9 fps.To address the above two issues,this paper has made two improvements:(1)Introduced a faster image semantic segmentation network ICNet,eliminated Deeplab-Fast,greatly increased the overall speed of the network,and reduced latency.When using only the accuracy of image semantic segmentation as a baseline for comparison,the drop of accuracy is slightly improved compared to DVSNet.Experiments show that on the Cityscapes Sequence dataset,when the excepted confidence score threshold is 92,the overall speed of the network is increased from 18.8 fps of DVSNet to 45.7 fps,the mIoU drop rate is reduced to 1.67%.The overall processing time of key areas decreased from 46.2ms to 8.7ms,which greatly eased the latency problem.(2)A composite key frame scheduling strategy is designed.In high-precision tasks,a fixed interval scheduling strategy is used,and in other cases a decision network strategy is used.The key frames are updated frequently in high-precision tasks,which offsets the disadvantage that the fixed interval scheduling strategy cannot flexibly schedule based on the distribution of key frames,and the image semantic segmentation path of the fixed interval scheduling strategy does not have additional calculation loss.Finally,when the interval is set to 2,the speed is increased to 35.7 fps and the accuracy is increased to 76.54% mIoU compared to the decision network strategy with an excepted confidence score threshold of 95.In other tasks,fixed interval scheduling has unstable performance due to the inability to adapt the key frame distribution pattern of video data when the interval is large,but the decision network strategy can stably output higher accuracy results,so the decision network strategy continues to be used.
Keywords/Search Tags:Video Semantic Segmentation, Deep Learning, Optical Flow, Key Frame Scheduling
PDF Full Text Request
Related items