| With the rapid development of the ICT and multimedia technology and the rapid popularization of the network broadband,the Internet has entered the era of reading pictures,and people are eager to see the world in the most straightforward way.Because audio & video files have so many characteristic,such as audiovisual integration,intuitive expression,stereoscopic image,virtual scene and carrying a large amount of information,thus it can accelerate the circulation of information and the full expression of information.In the field of agriculture,agricultural knowledge of audio & video which using of audio and video animation and displaying production process vividly can provide important scientific information to guide agricultural production,to improve efficiency,and plays an important role in the dissemination of information of agricultural science and technology.This paper,agricultural knowledge audio & video synthesis files(hereinafter referred to video)as the research object.Aiming at the problems of the retrieval coarsely and the dispersion of video resource and the lack of a unified sharing mechanism.This paper,firstly forming the video content of the text by basing on Natural Language Processing.Then design and implementation of an index system for agricultural audio and video synthesis files based on Map Reduce distributed computing model.At last,on this basis,provide users with retrieval interface,and realize the semantic retrieval services based agricultural knowledge video.The traditional way of video retrieval is based on text labels,using of the label attached through artificial to retrieve video.But obviously this label with some subjectivity has not qualified for rapid growth of the massive video data.Another method is based on video content retrieval,using of key frames as the basic access semantic unit for browsing and retrieval.The key frame is representative of the original video,so the study of extraction algorithm for the key frame is what’s difficult based on video content retrieval.With the development of computer vision,artificial intelligence and voice recognition,the research of video retrieval is gradually developing towards the combination of low-level feature information with high-level semantic information.This paper,based on the fully analysis of the characteristics of video files,aiming at the characteristics which "Audio based,supplemented by the video" and the video camera switching less and the declarative,and procedural knowledge,has launched the following several aspects of research:(1)In this paper,the data preprocessing mainly extracts text content based on speech recognition Firstly,the FFmpeg technology is used for extracting the audio in the video.And then the semantic content of the video is annotated on the basis of natural language and speech recognition thus forming the GB level text data file of video content.In order to improve the accuracy of speech recognition and retain the complete semantic units to the greatest extent,audio segmentation and silence detection are carried out in the speech recognition,and agricultural vocabularies are expanded.In order to verify the rate of speech recognition,and explore the feasibility of Natural Language Processing,semantic annotation on video index,the 100 video samples were tagged and manual annotation respectively,then calculated the word frequency.The results of comparative calculation show that the similarity achieved 98.4%,which can explain the speech recognition to achieve good results and video indexing is feasible based on Natural Language Processing.(2)In the distributed computing environment,the inverted index table is constructed in parallel with the video content text file.Using parallel computing model based on the TF-IDF algorithm,this study sets different weights to calculate the rank value according to the different position of word.According to the business logic,performance tuning the Mapreduce framework.In order to make the data spread out as much as possible,avoid large data skew,the business logic is broken down into one or more map-reduce.From the point of view of reduction of network flow,in the case of large amount of data,the system can open pluggable Combiner.And on this basis,using the same lexicon and word segmentation algorithm,the efficiency of building index between the single-machine and cluster environment is compared,which drew a conclusion that with the growth of data quantity the efficiency of the cluster showed a linear growth,while single-machine increases sharply.(3)The paper has designed architecture and developed prototype system.The architecture mainly includes offline video data processing and online video retrieval service.The offline video data processing includes data preprocessing module and parallel construction of inverted index module.The online video retrieval is mainly based on the Web Api to provide users with retrieval interface.The distributed retrieval system provided users with real-time and accurate semantic retrieval service.The paper has built a distributed index system and distributed retrieval system,for the integration the dissemination of video resources,sharing resources and agricultural science and spreading technology information has important significance.What is more,it provided a core technical support for video website of agriculture. |