Font Size: a A A

Research On Image Captioning Algorithm And Application In Agriculture

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2493306533472204Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The continuous expansion of the scale of agriculture promotes the modernization and intelligentization of agriculture in our country.The extremely importance is how to collect crop growth status and environmental information in real time and accurately alert and prompt abnormal conditions.An image captioning algorithm based on encoder-decoder was proposed in 2014.It successfully combined the image-text embedding model with the multimodal neural language model,which is important for real-time monitoring and recognition of the growth status of different crops in any scenario.This paper adds image attribute features on the basis of image global features,and at the same time studies the network depth of two-way long and short-term memory networks.The main research content of this paper is as follows: Aiming at the roughness of image global feature extraction,adding image attribute features to achieve multi-feature representation.The attribute extraction(AE-SSD)structure is constructed using an SSD network,and the front network of the SSD network is replaced with a Res Net-50 residual network.Secondly,in order to make full use of the past and future contextual information of sentences to predict semantics and learn more representative features,we longitudinally deepen Bidirectional Long Short Term Memory(Bi-LSTM)network,and design a multimodal Bidirectional Long Short Term Memory(multimodal Bi-LSTM)network.Finally,the model is tested on public data sets,and the experimental results are compared with other models.Make an agricultural scene data set,and apply the model to this data set,and analyze the experimental results and description text.In order to solve the problems of rough features and loss of information in traditional algorithm feature extraction,this paper adds image attribute features on this basis,uses global features and attribute features to represent image information at the same time,and proposes an image captioning algorithm based on multi-feature extraction.In this model,the SSD network is used to construct the image attribute extraction structure,the front network VGG-16 of the SSD network is replaced with the Res Net-50 residual network,which add a feature extraction layer.The model is tested on a public data set,and the experimental results are analyzed to verify the effectiveness of the image attribute extraction structure.The two types of image semantic information can more accurately describe the image information.In order to make full use of the context information of the sentence in the past and the future to predict the semantics and learn more representative features,multiple Long Short Term Memory(LSTM)layers are stacked as a hidden layer to hidden layer conversion,and a multimodal Bi-LSTM network is designed on the basis of Bi-LSTM network.Perform experiments and analysis on public data sets,and display them from evaluation indicators and description effects.It can be seen from the experimental results that the multimodal Bi-LSTM network can make full use of the past and future contextual information of sentences to predict semantics,learn more representative features,and the generated sentences are cover rich semantic information,which are more in line with human expression.This paper collects tomato leaves to make an agricultural scene data set,and realizes the application of agricultural scene based on the image captioning algorithm of multi-feature extraction.In the process of making agricultural data set,the data set amplification method is used to expand agricultural data set,and the median filter is used to perform denoising preprocessing of collected images.Use this data set to train the model in this paper to realize the image captioning of crop growth conditions in agricultural scenarios.This article includes 34 graphs,10 tables,and 84 references.
Keywords/Search Tags:image captioning, image attribute extraction, multimodal Bidirectional Long Short Term Memory(multimodal Bi-LSTM), agricultural
PDF Full Text Request
Related items