Font Size: a A A

Research On Image Semantic Description Based On Neural Network

Posted on:2021-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:J C ChengFull Text:PDF
GTID:2518306452464194Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image semantic description,as its name implies,hopes that the computer can understand the semantics of the input picture and organize the appropriate language to describe it.This task is a crosscutting task combining computer vision and natural language processing.For humans,it is not difficult,but for computers,it is undoubtedly a very challenging task.In real life,the research has great commercial potential in the fields of interaction design,video annotation,and so on.Traditional research methods,such as template-based and retrieval-based methods,are often particularly complex,requiring a large amount of labor and material resources to design templates,extract features.However,the description sentences they ultimately generate are rigid and inflexible.Thanks to the deep learning research that has emerged in recent years,the ability of neural networks to automatically extract features has been recognized by researcher again.The neural network-based method provides new ideas for this task.Based on the previous research,this paper uses the deep neural network and the deep learning framework Keras to construct an image semantic description model based on the Encoder-Decoder architecture.The model successfully bridges the semantic gap between different modalities and achieves modal conversion from vision to text.The main work of this article is also divided into two parts,as is the architecture of the model.In the part of the encoder responsible for extracting image features,this paper uses convolutional neural networks to extract features from images.We also design a set of attention mechanism strategies that are responsible for filtering these features so that the features that contain the most semantic information can be directly seen by the decoder,which can make the generated description text words more accurate.In the decoder part of text generation,this paper uses an improved LSTM network for text generation.In this paper,the traditional LSTM network is improved based on the method of residual connection in the residual network,so that it can see all the above information when generating text.The text grammar generated by the improved LSTM network is more in line with specifications and closer to human language.This paper uses the Python programming language to implement the image semantic description model mentioned above,and conducts experiments on MSCOCO data.By setting up different control experiments,the effectiveness of the improvements made in this paper is verified.This paper also compares this model with the current mainstream image semantic description models.The results show that our model’s performance is generally acceptable,and it performs very well in terms of word accuracy and grammatical standardization.
Keywords/Search Tags:Neural Networks, Deep Learning, Image Semantic Description, Natural Language Processing, Computer vision
PDF Full Text Request
Related items