Font Size: a A A

Research On Image Content Description In Chinese Based On Deep Learning

Posted on:2022-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:D Y KongFull Text:PDF
GTID:2518306515966769Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the wide application of deep learning in the field of computer vision and natural language processing,Chinese description of image content as a cross modal transformation task has gradually become a research hotspot.It deeply combined the two fields of computer vision and natural language processing.It not only needs to understand and extract the semantic information of the image,but also needs to transform it into text description sentences.This task can understand and describe image content,and its research content has high research significance and application value.It can be applied to many scenes,such as image search,image retrieval,image title generation,children's education and so on.This article uses the current popular encoder-decoder network structure.First,the encoder was used to extract the semantic feature information in the image,and then the decoder was used to decode the image semantic feature to generate the word vector probability matrix,and finally the word vector probability matrix is converted into the text description statement in the description generation stage.This paper conducts research and improvement work in the decoding phase,encoding phase and description sentence generation phase of the network model.The main research contents of this paper are as follows:(1)In the decoding stage,attention fusion mechanism was used to improve the network model.In view of the problem that the existing Chinese description methods of image content based on attention mechanism can't pay attention to the key content without weakening and missing attention information when decoding,a Chinese description method of image content based on image feature attention and adaptive attention fusion was proposed.Firstly,the structure of coding network was constructed to extract the image features from the encoder network.Then,the attention information of all the feature areas of the image was extracted through the image feature attention,and then the image features with the weight of attention were decoded by the decoding network to generate the hidden information to ensure that the attention information was not weakened or missing.Finally,the visual sentinel module of adaptive attention was used to pay more attention to the important areas in the image features,so as to extract the main content of the image more accurately.The experimental results show that the proposed method can effectively improve the image comprehension ability of the model,and the scores of various evaluation indicators are better than the comparison model.(2)In the coding stage,the image feature fusion mechanism was used to improve the network model.Although the attention fusion method used in the decoding stage can improve the extraction ability of the main content of the model to a certain extent,the model lacked the local details of the image,and did not make full use of the global and local image feature information,resulting in the model's poor understanding of the image details,so the improvement of the performance of the model was limited.In response to the above problems,based on the above-mentioned coding network structure,this paper proposed a Chinese description method of image content based on the fusion of global and local image features.Firstly,the convolutional neural network was used to extract the global feature and the common feature of the image.Secondly,the regional recommendation network was used to generate local candidate regions in the image common feature map;then,the non-maximum suppression algorithm was used to filter the local candidate regions,and the regions of interest(ROI)pooling network layer was used to extract the local features mapped by the local candidate regions.Finally,the global and local features were deeply fused to enhance the correlation between the two features,so that the model can fully understand the global scene information and local details of the image content.(3)In the stage of description statement generation,the candidate with the largest probability value was directly used as the final word vector for the two methods mentioned above,which leads to the poor effect of the generated description statement.In this paper,the cluster search algorithm was used to optimize the generation stage of the description sentences of the two Chinese image content description methods proposed above,so that the model can find the best image description sentences in a reasonable solution space.
Keywords/Search Tags:Chinese image content description, Deep learning, Convolutional neural network, Long short-term memory network, Attention mechanism
PDF Full Text Request
Related items