Research On Image Semantic Description Based On Neural Network

Posted on:2021-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:J C Cheng

Full Text:PDF

GTID:2518306452464194

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image semantic description,as its name implies,hopes that the computer can understand the semantics of the input picture and organize the appropriate language to describe it.This task is a crosscutting task combining computer vision and natural language processing.For humans,it is not difficult,but for computers,it is undoubtedly a very challenging task.In real life,the research has great commercial potential in the fields of interaction design,video annotation,and so on.Traditional research methods,such as template-based and retrieval-based methods,are often particularly complex,requiring a large amount of labor and material resources to design templates,extract features.However,the description sentences they ultimately generate are rigid and inflexible.Thanks to the deep learning research that has emerged in recent years,the ability of neural networks to automatically extract features has been recognized by researcher again.The neural network-based method provides new ideas for this task.Based on the previous research,this paper uses the deep neural network and the deep learning framework Keras to construct an image semantic description model based on the Encoder-Decoder architecture.The model successfully bridges the semantic gap between different modalities and achieves modal conversion from vision to text.The main work of this article is also divided into two parts,as is the architecture of the model.In the part of the encoder responsible for extracting image features,this paper uses convolutional neural networks to extract features from images.We also design a set of attention mechanism strategies that are responsible for filtering these features so that the features that contain the most semantic information can be directly seen by the decoder,which can make the generated description text words more accurate.In the decoder part of text generation,this paper uses an improved LSTM network for text generation.In this paper,the traditional LSTM network is improved based on the method of residual connection in the residual network,so that it can see all the above information when generating text.The text grammar generated by the improved LSTM network is more in line with specifications and closer to human language.This paper uses the Python programming language to implement the image semantic description model mentioned above,and conducts experiments on MSCOCO data.By setting up different control experiments,the effectiveness of the improvements made in this paper is verified.This paper also compares this model with the current mainstream image semantic description models.The results show that our model’s performance is generally acceptable,and it performs very well in terms of word accuracy and grammatical standardization.

Keywords/Search Tags:

Neural Networks, Deep Learning, Image Semantic Description, Natural Language Processing, Computer vision

PDF Full Text Request

Related items

1	Image To Language:Auto Image Captioning Using Bi-directional LSTM And Deep Attention Neural Networks
2	Research And Implementation Of Algorithm For Calculating Correlation Between Image And Text Based On Deep Learning
3	Design And Implementation Of Automatic Video Description Based On Deep Learning
4	Research On The Semantic Description Method Of Images Based On Deep Learning
5	Research On Natural Language Description Generation For Short Video In Self Media
6	Natural Language Generation Description Method For Short Videos
7	Research On Image Description Generation Based On Visual Attention
8	Multi-type Chinese Description Generation System For Images
9	Research On Image Description Algorithms Based On Convolutional Neural Networks
10	Research On The Algorithm Of Clothes Matching Based On Deep Learning