Font Size: a A A

Study On Semantic Segmentation Of Traffic Scene Images Based On Deep Neural Networks

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:P A LiFull Text:PDF
GTID:2392330614971860Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic segmentation is an important issue in the field of computer vision.Traditional image segmentation algorithms often rely on prior knowledge to set model parameters.So,to some extent,the capability of feature extraction is low,the predicted edges are indistinct,the efficiency and accuracy of segmentation cannot meet the needs of complex images.With the rapid development of deep learning,the technology of semantic segmentation based on fully convolutional networks has realized end-to-end pixel-wise classification,which greatly improved the efficiency and the speed of the model.Traffic scene images are complex with variable object sizes,which poses great challenges to the accurate segmentation of objects with different scales,especially for small ones.In this thesis,we focus on problems of inaccurate segmentation of small objects in traffic scene images.We select the encoder-decoder structure as basic framework.The network conducts multi-task learning and densely-connected construction from the perspective of multi-scales and multi-branches feature fusion.The main contributions can be summarized as the following points:(1)In order to reduce the loss of useful information in the process of feature extraction,this thesis builds a densely connected network to fuse multi-scale information.In the encoder part,with the deepening of the network,the feature maps drop to 1/2,1/4,1/8,1/16 times of the original input respectively,the semantic information becomes more abstract and part of the detailed information is lost.In response to this problem,this thesis combines four sets of feature maps with different reduction factors,and merge two adjacent groups to form three combined branches.And then,the work generates three main branches and fuse them together.Finally,the up-sampling operations obtain the prediction with the same resolution as the original image.Feature maps with large gap of resolution and semantics often cause interference and misjudgment in feature fusion.Using adjacent groups effectively avoids this problem and reduces the loss of information.The segmentation accuracy is improved by 1.6% on the Cityscapes dataset.(2)In order to extract semantic features accurately,this thesis builds a multi-task learning network with two branches,also can be called two encoders.The network extracts features for objects with large size and small size respectively,and each branch updates the learning parameters independently.Finally,we concatenate the results from two branches and achieve the predicted value.In order to train these two different branches,the work divides the dataset into two sub-datasets.Firstly,we count the pixels of the training set and divide it by the total number of images to take an average.Then,the dataset is divided into two parts according to the mean value,the number of the pixels which is more than the average value will be divided into the large-object sub-dataset,otherwise to the small one.And the two sub-datasets are used to train two feature extraction branches.The two branches are the same and the parameters are updated according to the datasets.In this way,the model achieved about 1.07% higher than the baseline.(3)For the purpose of enhancing the data,our work crops the images.When building the multi-task learning framework,the dataset will be divided into two sub-datasets of large and small objects,this directly leads to the problem that the amount of training data for each branch is reduced by nearly half.The reduction of the data may cause over-fitting and other phenomena.Since data augmentation is required,our work proposes a simple and effective method.Observing the image from the driver's perspective,it is easy to find that the left part and right part of the traffic scene has similar semantic categories and layout.On the other hand,according to the actual situation,most of the driving views are straight going.In order to obtain more finely annotated datasets,we crop the image into equal parts along the vertical center line,and divide the sub-images into large and small sub-datasets.It is important to note that the pixels are counted and the images are partitioned separately from the original dataset and the dataset being cropped.This method simply and effectively solves the hidden troubles in the multi-task learning method,and the segmentation accuracy is improved by 1.21%.In the end,the above methods are used together in the network.The experimental result is 3.35% higher than the baseline.
Keywords/Search Tags:Convolutional neural network, Multi-task learning, Dense connection, Multi-scale feature fusion, Data augmentation
PDF Full Text Request
Related items