| In the context of English as an Esperanto and global education sharing,courses taught in English is a trend.Due to the large number of proper nouns in the field of education and the accent differences in English,classroom subtitles are very important for enhancing the effectiveness of courses taught in English.The development of artificial intelligence technology has accelerated the progress of automatic caption generation technology,but at present,there are few researches on subtitle generation for English classroom teaching.This article takes the subtitle generation of English classroom teaching as the research content.At present,there is no public speech dataset in the field of education,and the effect of speech recognition is related to a specific field.The scarcity of datasets in the education field will affect the effect of speech recognition.Different from the task of online video subtitle generation,it is not necessary to add punctuation and other processing on video subtitles.In the classroom teaching,in order to enable students to clearly read and understand the teaching content,punctuation and other post-processing of subtitles are essential.In view of the above analysis,we have carried out the following research work:(1)For the problem of the scarcity of public datasets in the field of education,we constructed a multimodal dataset Khan containing video,audio and text,and conducted training and contrast experiments on the speech recognition model on the multimodal dataset.The results show that the dataset can help improve the speech recognition effect in the field of education.(2)The sequence segmentation model based on self-attention mechanism is applied to the punctuation prediction task,and the combination of audio information and text information is used on the multimodal dataset to improve the effect of punctuation prediction.Contrastive experiments were carried out on the IWSLT2012 public dataset,the self-built news dataset and the multimodal dataset Khan.The experiments prove that our method is universal in datasets in different fields.(3)The sequence segmentation model based on the self-attention mechanism is used for the task of paragraph segmentation.An experimental evaluation on the dataset Khan demonstrates the effectiveness of our approach to segmentation of texts in the field of education.In short,the subtitle generation work for classroom teaching proposed in this paper has great research value.We have created multimodal dataset Khan and used the self-attention mechanism for punctuation prediction and paragraph segmentation,and made a deep study in the topic of subtitles generation for classroom teaching. |