Font Size: a A A

Research And Implementation On Educational Video Annotation Based On Multi-Modal Features

Posted on:2018-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YuFull Text:PDF
GTID:2417330596490064Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the continuous development of multimedia technology,multimedia data has become one of the most popular ways to have access to information.For the massive amount of data,it is common to add an annotation to each video to create an index for it.The traditional visual feature-based video annotation methods reduce the video "semantic gap" by building the mapping between the low-level visual features and the high-level semantic information to automatically annotate the videos.As a kind of domain video,educational videos make online learning possible by realizing the persistence of teaching resources.However,educational videos are different from other videos,whose visual characteristics are not obvious,with limited scenes and son on which makes traditional methods of video annotation difficult to achieve satisfactory results.Therefore,how to effectively and automatically annotate educational videos according to their characteristics to meet the requirements of retrieval and management of educational videos becomes the urgent challenge.This paper proposes a method of educational video annotation based on multi-modal features.By combining image,audio and text features of the educational video,the method annotates the educational videos comprehensively to solve the problem that visual features of educational video are not obvious.The main research work includes:First,an educational video annotation framework based on multi-modal features is proposed.This method is different from the popular video annotation methods based on convolutional neural network.It combines the three characteristics including image,text and audio of educational videos to annotate them from various angles.Secondly,a hierarchical method to process educational videos is proposed.This method is based on the traditional method of shot segmentation and key frame extraction.According to the characteristics of educational video,the key frames are quickly classified by the technology of face recognition.Then a regional key frame extraction method is used to detect different courseware frames according to their characteristics.This method effectively reduces the processing complexity of educational videos.Thirdly,a method of educational video annotation based on audio modal features is proposed.This method combines audio recognition,chi-square test and TF-IDF to extract and analyze the audio modal features of educational videos and annotates them with their courses.Fourthly,a method of educational video annotation based on text modal features is proposed.Based on the existing OCR products,this method improves the extraction of text content in the courseware frames.By mapping with the given outline,the knowledge points of the educational videos are annotated.At the same time,the scene merging process is carried out by the proposed video tree model,and the results are merged with the annotation results based on the audio modal features.The multi-modal feature-based instructional video annotation method combined with the characteristics of multiple modal features can solve the errors and defects caused by using only single modal feature,and can annotate the educational video more comprehensively.In addition,through the design,implementation and verification of the prototype system,the validity of the method is verified.
Keywords/Search Tags:Educational Video, Video Annotation, Shot segmentation, Video Tree
PDF Full Text Request
Related items