Font Size: a A A

Research And Implementation Of Agricultural Videocaption Algorithm

Posted on:2018-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2335330512986878Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In order to generatebetter semantic index for agricultural videos,we research and implement an agricultural video caption algorithm to generate natural sentences which describe agricultural videos' content as agricultural videos' semantic index and synopsis.So framers can retrieve agricultural videos by semantic keywords and filter the retrieval results with the help of agricultural videos' caption.This method can greatly reduce the time wasted to retrieval desired videos in a large number of videos and make a contribution to the development of agriculture.Generatingcaptionfor agricultural videos is faced with many difficulties,such as how to extract semantic key frames which can represent agricultural videos' semantics,how to identify objects and their relative relationships in semantic key frames,how to express semantic key frames with natural sentences.It is aproblem which involves in computer vision and natural language processing.We proposeto generate captions for agricultural videosin these ways: divide agricultural videos intoshots according to the frame transitions,extract for shots,extract image featuresfor semantic key frames and map image featuresinto meaning space,extract text features for semantic key frames' captions which are generated manually,map text features into meaning space,study to generate captions for semantic key frames in meaning space using recurrent neural networks.The main work of this paper is as follows:(1)Extract image features for semantic key frames.Extract compression key frames for agricultural videos,divide agricultural videos into the shots usingshot boundary detection algorithm with fixed thresholds in compressed domain based on histogram features,useK-Means clustering algorithm to extract semantic key frames for shots,train deepimage feature extractor based on bounding boxes which are generated manually,extract deep image features for semantic key frames.(2)Extract text featuresforcaptions.Generate captions for semantic key frames manually,segment words in captions using words segmentation algorithm,build initial Chinese vocabulary for all captions,merge synonyms in initial Chinese vocabulary with the help of words similarities measure algorithm to get final Chinese vocabulary,convert words in captions into index array which can play as text features of captions according to the final Chinese vocabulary.(3)Learn to generate captions forsemantic key frames.Map image features ofeach semantic key frame to a meaning vectorand encode it into hidden layers of recursive neural network;map text features of the captions corresponding to the semantic key frameto a set of meaning vectorsin meaning space,input them to hidden layers to decode the captions.The encoding matrix and decoding matrix of recursive neural network are learned according to semantic key frames and captionsin the training dataset.The main innovation of this paper is extracting image features based on regions rather than the whole image,extracting text features based on synonyms rather than words.Experiments on the agricultural videos show that the two innovations increase the score of agricultural video caption by 5.1 and 1.7.
Keywords/Search Tags:video retrieval, image caption, word segmentation, deep learning
PDF Full Text Request
Related items