Font Size: a A A

Research And Application Of Image Dense Prediction Based On Deep Learning

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J OuFull Text:PDF
GTID:2428330623467813Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image dense prediction is to make predictions for each pixel in an image.In computer vision,image dense prediction is a important research topic,and image segmentation is its representative example.In addition,the keypoint prediction and other applications also usually adopt the strategy of dense prediction.With the rapid progress and application of deep learning in recent years,research on image dense prediction has also made great progress,and many methods have been proposed and applied in the industry.The research in this thesis applies deep learning based image-dense prediction to two fields: multi-person pose estimation and retinal fundus edema lesion area segmentation.First,a full-resolution single-person pose estimation method is proposed for the top-down multi-person pose estimation strategy.A spatial attention mechanism is used to fuse multiscale features and restore the feature maps to the original image resolution.This reduces the accuracy loss due to down-sampling.Then,a new bottom-up pose estimation method is proposed.It includes a light-weight stacked hourglass network,and a efficient joint assembly algorithm based on the joints and their offsets predictions,both of which greatly reduce the computational cost of the network and enables the algorithm to achieve realtime running speed.Finally,for the retinal fundus image,a U-shaped structure network is proposed to perform multiple lesion segmentation and classification at the same time,in which a parallel attention module is used to improve the recognition of small areas,and a new loss function is proposed to handle the problem of imbalanced categories.This method won the AIChallenger 2018 competition.The main contributions of this thesis are as follows:(1)Top-down multi-person pose estimation first detects each person region,and then performs pose estimation on each instence.For the second step,we proposed a singleperson pose estimation network to improve the position accuracy of keypoints.It uses a full-resolution encoder-decoder structure to reduce the errors caused by scale scaling and quantization.A global context block is used to optimize the encoder and decoder,and it combines spatial heatmaps and multi-scale features together to obtain more precise local information.With lightweight ResNet34 as the backbone,our method achieved 72.5 %mAP on the MSCOCO dataset.(2)This thesis proposed an efficient bottom-up multi-person pose estimation algorithm.A lightweight stacked hourglass network is used to greatly reduce the model parameters and computational cost,and a multi-receptive field mechanism is incorporated into the model to adapt to the objects of different scales.Based on the predictions of joints and their offsets,a efficient joint assembly algorithm is also proposed.While our algorithm is lightweight and real-time,it also achieves the state of the art accuracy.In MPII multi-person data set,the proposed method reached 81.0 % mAP.(3)This thesis proposed an U-shaped network to simultaneously segment and classify multiple lesions in retinal fundus images.A parallel attention module is used to optimize the detection of lesions in small areas.The process of feature map vectorization is improved by feature encoding.Loss function is improved to handle the classs imbalance problem.Our method achieves the AUC of 99.38 % for lesion classification and the dice of 76.12 % for lesion segmentation,and won the first place of the 2018 AIChallenger global competition.
Keywords/Search Tags:Deep Learning, Dense Image Prediction, Keypoint Detection, Fundus Edema Lesion Area Segmentation
PDF Full Text Request
Related items